muti-threading performance in C#

kevind · **Joined:** Sat Jun 04, 2011 5:14 am **Posts:** 2

Hi Sergey, first let me commend you on a great job with ALGLIB. I'm using it to port some MATLAB code to C# so I can get higher performance by running multiple threads on a multi-core machine. (if you run 8 threads of "compiled" MATLAB, it takes > 8x as long as a single thread - one thread runs while the other 7 spin in a loop waiting for the runtime engine to be available). I found it pretty easy to get up and running with ALGLIB, and the the computational results match up very well with MATLAB. I'm using primarily rmatrixsvd, spdmatrixcholesky, rmatrixevd, and lsfitlinear.

The good news is that ALGLIB code running in multiple threads can really get some performance gain over a single thread (unlike MATLAB). I'm seeing something like a 3x gain when running 8 threads of ALGLIB-based code (there are other things going on so it's a bit difficult for me to isolate).

I'd like to see an even bigger improvement but CPU utilization on the individual cores seems to stay in the 30-40% range. My question is: why can't I get higher CPU utilization when running multiple threads of ALGLIB-based code? My belief is that the reason is that ALGLIB calls the "new" function quite a lot to allocate memory, and that performance-wise this is much more expensive than I had thought. It appears that when "new" is called, the thread passes control to the OS, so high CPU utilization rates are impossible if you have lots of threads calling "new" frequently.

I've even seen cases where "new" seems to be called faster then the garbage collector can reclaim memory. In this case the memory utilization grows until physical memory is full and garbage collection is forced. Again, there are other things going on so I'm not 100% certain that this is due to ALGLIB, but I believe it is

Is there any way to improve the performance under multi-threading? I'd like to be able to pre-allocate the memory and never call "new" again, but that doesn't appear to be an possible.

Thanks,
Kevin

Sergey.Bochkanov · **Joined:** Fri May 07, 2010 7:06 am **Posts:** 927

How large are your matrices? Most ALGLIB algorithms are optimized for moderate/large data, when allocation penalty is not that high when compared with data processing cost. I suppose that you call ALGLIB functions many times for very small amounts of data. Is it right?

I agree that efficient algorithm should use dynamic allocations as infrequently as possible. Algorithms which were implemented within several last years (optimizers) cache as much as possible. But some old algorithms (linear algebra, for example) still do not store dynamically allocated arrays between calls.

However, it has only moderate influence on their performance. I've made experiments with rmatrixsvd() - it achieves quite good speedup with 4 threads on even small matrices:

Code:

MATRIX SIZE     SPEEDUP
512                  3.89
64                   3.88
32                   3.69
16                   2.94

All timings were done on my Intel Core2 with 4 cores and 4 worker threads. You may see that speedup decreases for small matrices (due to dynamic allocation penalty), but even with N=32 calls to rmatrixsvd() have moderate dynamic allocations overhead. So I suppose that either you've tried to work with very small matrices (below N=32), or performance deterioration you''ve seen has other reasons.

Attachment contains source code which I've used to test. ALGLIB 3.3.0 was used.

In any case, you've opened interesting question, and I think that ability to store temporaries between calls to linear algebra functions (and some other functions too) will be added to one of the next ALGLIB releases (3.4.0 is almost ready, there is not so much time left before release, so it may wait for 3.5.0).

kevind · **Joined:** Sat Jun 04, 2011 5:14 am **Posts:** 2

Sergey, Thanks for your reply, and yes you are exactly right, there are many operations on small matrices. The data I'm working with is rectangular, long in one dimension and short in the other, so some of the intermediate operations are on matrices as small as 5x5. Lots of calls on matrices that small from multiple threads result in a lot of calls to "new" which starves the cores so they can't get high utilization.

Thanks for looking into it; I'll look forward to a future release that addresses the issue.

-Kevin

forum.alglib.net

Forum rules

muti-threading performance in C#

Who is online