forum.alglib.net

ALGLIB forum
It is currently Sun Dec 22, 2024 9:12 am

All times are UTC


Forum rules


1. This forum can be used for discussion of both ALGLIB-related and general numerical analysis questions
2. This forum is English-only - postings in other languages will be removed.



Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Discussing redesign of the C++ version
PostPosted: Mon Jun 21, 2010 7:35 am 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 927
Hello!

Here is idea that I want to discuss with users of the C++ version of ALGLIB.

ALGLIB is huge. For example, C++ version includes 208 source files with total size equals to 3.4 Mbytes. Of course, all this can be reduced to just one library file and... hmmm... 104 headers (just 104!) by typing 'build' into command line. But sometimes users don't want to use ALGLIB build system – they want to include source files into their project (for one or another reason). However, 104 is too much when you talk about units in your project.

The idea is to merge all ALGLIB units into one large file alglib.cpp with four smaller support files. It is much more easier to work with 5 files than to work with 104. So anyone may use ALGLIB build system or just include these 5 files into his project.


Another idea I want to discuss is reimplementation of ALGLIB computational core in C (interface will remain C++, only implementation will change).

Why C? Well, I already have pure C version of ALGLIB (it was developed for planned ALGLIB-Python interface). This “Pure-C-ALGLIB” will use multithreading, SSE intrinsics (on x86) or other types of CPU-specific tuning (on non-x86 platforms). Of course, same optimizations can be applied to the C++ version, but it simpler to optimize just one version (written in C) instead of two (C and C++). It is possible, but will slow down ALGLIB development.


I know that both ideas aren't perfect. “One large file” will be approximately 3.4 Mbytes large which is not good. From the other side, what is better – 208 small files or one large? As for using C language... Idea of “C core” may have drawbacks which I don't see now. So I want to discuss these questions with ALGLIB users.

Anyone is welcomed to comment!

P.S. I've made two small slides which can be downloaded from
http://www.alglib.net/share/announce/cpp-redesign-1.gif
http://www.alglib.net/share/announce/cpp-redesign-2.gif
They contain graphical representation of what was said above.


Top
 Profile  
 
 Post subject: Re: Discussing redesign of the C++ version
PostPosted: Mon Jun 21, 2010 5:21 pm 
Offline

Joined: Fri Jun 18, 2010 7:17 pm
Posts: 4
Hi,
I think the second strategy you have outlined does make a lot of sense. However, I would like to suggest additional steps (if it is possible). As described in your readme page, you currently breaks down your library into a few class of functionalities such as Statistics, DSP, etc. My thought is that rather than going all the way to 208 to 1 (or 5) files, let's do it half way 208 to n files (where n is the number of functionalities you offered in your library). My reason is that most users will have specific uses of the library in mind and may not necessary need all functionalities at once. This way your users can pick and choose which sub-libraries to download and use.

Thanks


Top
 Profile  
 
 Post subject: Re: Discussing redesign of the C++ version
PostPosted: Mon Jun 21, 2010 9:17 pm 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 927
barggio wrote:
My thought is that rather than going all the way to 208 to 1 (or 5) files, let's do it half way 208 to n files (where n is the number of functionalities you offered in your library).

It makes sense. For example:
* DataAnalysis package needs only LinAlg and Optimization
* DiffEquations have no dependencies
* FastTransforms - no deps too
* Integration needs only LinAlg
* Interpolation needs only LinAlg and Optimization
* LinAlg - no deps
and so on.

We can have just 12 packages (11 public and one internal) instead of 104 translation units, which will allow users to either use everything or to spend several minutes to check dependencies and use only what they need. We can also partially solve problem of "one-big-very-big-when-does-it-open?-file" by splitting code in several units. And I think we can combine C core and C++ interfacing code in one unit (less files).


Top
 Profile  
 
 Post subject: Re: Discussing redesign of the C++ version
PostPosted: Fri Jun 25, 2010 8:17 pm 
Offline

Joined: Mon May 10, 2010 3:19 pm
Posts: 8
Sergey,
I also will go with option #2, I like sub libraries/packages separately by topic, which can be downloaded indepently and use independently. I, personally only use mainly functions from ortfac.cpp and densesolver.cpp so I would prefer to NOT download other files from ALGLIB :-) but that is just my prefererence.

You have done a great job making alglib so far!
Thanks


Top
 Profile  
 
 Post subject: Re: Discussing redesign of the C++ version
PostPosted: Fri Jul 09, 2010 1:21 pm 
Offline

Joined: Tue Jul 06, 2010 8:00 pm
Posts: 21
Sergey, for Windows systems using .NET would you consider the following: Make one DLL with all ALGLIB functions and produce 12 "header" packages for inclusion in user programs, as suggested above. The DLL itself could be written in optimized multi-threaded C, which you have already developed for Python. Once it is translated into the CLR, it will be as fast as it gets and could be used with any languages running on the CLR (C#, F#, C/C++, VB, ...).

The DLL itself can be "large" (are we talking 10MB here?) but that does not matter on modern systems with lots of memory and disk storage. You could consider making "header" files hierarchical, e.g., just like Microsoft does in VB.NET:

Imports Microsoft.Interop.Office
Imports Microsoft.Interop.Office.Excel
...

so that user code is not polluted with too many includes and the user can be as general or as specific as they want in terms of importing the DLL functionality.


Top
 Profile  
 
 Post subject: Re: Discussing redesign of the C++ version
PostPosted: Fri Jul 30, 2010 4:47 am 
Offline

Joined: Thu Jul 29, 2010 11:56 pm
Posts: 1
Hi, Sergey!

Two strategies are: (i) a software-technology-based solution (ii) user-oriented solution. I don't have real statistic of ALGLIB-users preferences. Perhaps, you have. If not, I suggest you to do a poll between ALGLIB-users. For instance, I'm belong to the kind of users you have mentioned addressing the subject: I would like to include a small fraction of ALGLIB into my own project. Depending on the share of users like me, you should choose the solution.

For me, the idea (already mentioned in this discussion) of several packages separated by topics sounds good, in spite of a large intersection of their kernels.

Additionally, I have a general question and a suggestion, both non-related with the subject of discussion.

The question is: What is a correct form of reference to ALGLIB in a journal paper, if the paper experiments are based on software that partially includes some codes of ALGLIB?

A suggestion is related to the spline* units. My particular interests include search of local extrema of a function approximeted by spline. It is quite easy to find for each spline segment (or rectangle, for bicubic case), whether it contains or not inner points with 1st derivatives (or both 1st derivatives, for bicubic case) to be zero. Of course, a user like me can analyze the meaning of spline interpolants by your code then solve a couple of square equations and finally find candidates to be extremal points. However, it would be much better if you just extend slightly the subroutines of spline* units to include this kind of operation.

Finally, I would like to mention that I appreciate very much the ALGLIB project, thank you.

Georgii Khachaturov


Top
 Profile  
 
 Post subject: Re: Discussing redesign of the C++ version
PostPosted: Sat Jul 31, 2010 7:03 pm 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 927
gxtarunz wrote:
For me, the idea (already mentioned in this discussion) of several packages separated by topics sounds good, in spite of a large intersection of their kernels.

I think that I'll adopt it in one of the next releases. Someone rarely needs whole ALGLIB, and many users want to compile ALGLIB themselves. So reducing number of compilation units will benefit most part of ALGLIB users.

gxtarunz wrote:
Hi, Sergey!
The question is: What is a correct form of reference to ALGLIB in a journal paper, if the paper experiments are based on software that partially includes some codes of ALGLIB?

You can place link to www.alglib.net There is no publication describing ALGLIB that you can cite, so URL is the only thing you can do.

gxtarunz wrote:
A suggestion is related to the spline* units. My particular interests include search of local extrema of a function approximeted by spline.

Interesting idea, I've added it to issues tracker. I plan to do some work with splines in the next several weeks, so it has all chances to find its way into next release.


Top
 Profile  
 
 Post subject: Re: Discussing redesign of the C++ version
PostPosted: Tue Jan 04, 2011 8:05 pm 
Offline

Joined: Tue Jan 04, 2011 7:59 pm
Posts: 5
Hi

I, for one, would love a pure C version of the library. In general, I tend to keep to plain ANSI C when I write numerical libraries. It starts getting a bit confusing when there are a lot of overloading - especially when you use different packages and/or have different collaborators. In most cases, classes in numerical libraries are used only to make the function prototype look slightly nicer. In addition, note that recent things like the CUDA GPU language do not understand C++.

Will a pure C version of the library be available any time soon?

TW


Top
 Profile  
 
 Post subject: Re: Discussing redesign of the C++ version
PostPosted: Wed Jan 05, 2011 10:40 am 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 927
It is possible, but I don't know how to make it user friendly. Just compare

Code:
{
    minbleicreport rep;
    ap::real_2d_array c;
    c.setlength(1,2);
    c[0][1] = c[0][0];
}


in C++ with its C counterpart

Code:
{
    minbleicreport rep;
    ae_matrix c;
    _minbleicreport_init(&rep);
    ae_matrix_init(&c, 1, 2, DT_REAL);
    c.ptr.pp_double[0][1] = c.ptr.pp_double[0][0];
    _minbleicreport_clear(&rep);
    ae_matrix_clear(&c);
}


All ALGLIB structures contain pointers to dynamically allocated memory. So you have to initialize structure with init function and clear it with clear function.

You have to do same with arrays. And you can't use something like double a[3][2] for two-dimensional arrays because all arrays which are passed to ALGLIB functions must be pointers to the ae_matrix (or ae_vector) structure which contains pointer, some service information, and supports reallocation of array (something you can't do with double a[3][2]). Not user friendly, right? :(

Do you think that such pure C library will be useful?


Top
 Profile  
 
 Post subject: Re: Discussing redesign of the C++ version
PostPosted: Wed Jan 05, 2011 3:08 pm 
Offline

Joined: Tue Jan 04, 2011 7:59 pm
Posts: 5
I agree that the native data structure may look rather clumsy. To me, the payoff is clarity and not having to worry about incompatibility with certain systems that may only deal with ANSI C in scientific computing - the latter is a much more hassle to an average researcher. For example, if I want to encode algorithms in FPGA or GPU, the translator or the the compiler add-on typically will only deal with straight C. If you look at the new TESLA-based supercomputer in China and the new 'personal supercomputer' boxes based on TESLAs, I suspect a lot more CPU intensive algorithms in the future may be implemented on non-traditional multi-purpose CPU units. (see + later)

Personally, I've always relied on passing around double ** around internally as it is clearer in my own code. I tend to build a Util library to copy data into the appropriate structure; so most of the init and copying can be hidden away in simple C functions. Given that most packages have their own quirks, some copying will be necessary anyway as soon as you start using a couple of different ones. Each of your matrix objects have an ae_matrix structure anyway - perhaps passing that around instead of straight double * would at least keep your dimension information.

BTW, I'm really grateful for the Alglib project; you've done a good job. It looks good - I'm in the process of migrating from NR and GSL. I will buy the commercial license in a month after I run through some tests.

(+) Actually, I usually insist on a straight C implementation of core numerical libraries for my research staff because I do not trust people generally to build efficient compiled code in C++. I have seen a lot of 'elegant' looking code that requires me to look at tens of files to ensure I understand what the operator/function actually does as some people have gone overboard in overloading, inheritance and polymorphism. Also, some schools actually encourage widespread use of templates - if not careful could actually slowdown compiled code if the user does not pay attention to compiler settings. As most people are not careful and large scale collaboration typically decay code to the lowest competence level of the group, I would prefer the simplest language feature set to get the job done. Ugly looking code only offends if you have to look at the code over and over again if there is a mistake - if it works, ugly code is hardly visible.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: Bing [Bot] and 53 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group