forum.alglib.net
http://forum.alglib.net/

PLINQ/Task vs Rbf
http://forum.alglib.net/viewtopic.php?f=2&t=643
Page 1 of 2

Author:  maelstorm [ Thu Nov 15, 2012 11:54 am ]
Post subject:  PLINQ/Task vs Rbf

Hi, is it possible (thread safe) evaluate in C# version of alglib rbf model in parallel, for example:
Code:
var res = points.AsParallel().Select(x =>
{
  double result;
  alglib.rbfcalc(model, x, out result);
  return result;
});

Author:  Sergey.Bochkanov [ Fri Nov 16, 2012 7:01 am ]
Post subject:  Re: PLINQ/Task vs Rbf

No, it is not thread-safe.

Each rbfcalc() call involves search for nearest neighbors of a point - such search uses fields of model as temporaries which are modified during search. So, two parallel calls to rbfcalc() for same model will lead to two searches being executed simultaneously and sharing same internal buffers.

Author:  Sergey.Bochkanov [ Fri Nov 16, 2012 7:02 am ]
Post subject:  Re: PLINQ/Task vs Rbf

BTW, it does not mean that you can't call rbfcalc() for two completely different structures - you just can't use one RBF model from two threads.

Author:  maelstorm [ Fri Nov 16, 2012 5:06 pm ]
Post subject:  Re: PLINQ/Task vs Rbf

Sergey.Bochkanov wrote:
No, it is not thread-safe.

I have very large dataset (~100.000 points, sometimes even ~1.000.000).
Can i still achieve a significant performance gain if i make somehow copy
of RBF model structure? Is it possible to make shallow copy of fields thad don't
change during the evaluation of the model and deep copy of fields that do change?

Author:  Sergey.Bochkanov [ Sat Nov 17, 2012 8:06 am ]
Post subject:  Re: PLINQ/Task vs Rbf

You can copy structure by means of serializing it, and then performing several un-serializations from same source. However, your task is really huge - you will need more than 100MB just to store kd-tree search structure used by RBF model. So, you may face serious issues with memory size, bus bandwidth and CPU cache size, if you try to parallelize computations on multi-core computer. However, it may be worth trying :)

Regarding performing "smart" copying - it is possible, and it should work, but I can't say in several words which fields should be copied, and which - should not. There are two main places where memory is consumed - rbfmodel's fields which store centers/weights, and internals of the kdtree search structure. You can examine ALGLIB source and determine which fields are changed only at the initialization of the structure - these fields can be shared between different copies of the model.

Author:  maelstorm [ Sat Nov 17, 2012 10:10 am ]
Post subject:  Re: PLINQ/Task vs Rbf

Thanks, btw is there any plans of parallelizing alglib? Imho lack of parallel versions of algorithms will be a huge drawback of alglib in the future, for example if you have core i7 cpu and 32 gb ram...

Author:  Sergey.Bochkanov [ Sat Nov 17, 2012 12:28 pm ]
Post subject:  Re: PLINQ/Task vs Rbf

Yes, it is planned to release multicore version of ALGLIB in the first months of 2013. It is really a huge drawback, because many important algorithms can greatly benefit from parallelization. We've already implemented framework for scheduling tasks between different cores (for some reasons we do not want to use SMP features of NET 4 or OpenMP), now it is in the testing phase.

Author:  maelstorm [ Sun Nov 18, 2012 1:40 pm ]
Post subject:  Re: PLINQ/Task vs Rbf

Hi again, one more question about memory consumption: i'm trying to run the following test code on win7 x64 with 4GB ram and program crashes with out of memory error somewhere during the model construction:
Code:
            alglib.rbfmodel model;
            int N = 1000*400;

            int expected_mem = 250 * N * (sizeof(double) + 2 * sizeof(int));
            Console.WriteLine( "Expected memory consumption = {0:N}", expected_mem );

            int nx = 3;
            int ny = 1;
            alglib.rbfcreate( nx, ny, out model );

            var rnd = new Random();
            var data = new double[N, nx + ny];
            var pts = new double[N][];
            var res = new double[N][];

            for ( int i = 0; i < N; ++i ) {
                data[i, 0] = -1 + rnd.NextDouble() * 2;
                data[i, 1] = -1 + rnd.NextDouble() * 2;
                data[i, 2] = -1 + rnd.NextDouble() * 2;
                data[i, 3] = Math.Sin( data[i, 0] ) * Math.Cos( data[i, 1] ) * data[i, 2];

                pts[i] = new double[nx];
                pts[i][0] = -1 + rnd.NextDouble() * 2;
                pts[i][1] = -1 + rnd.NextDouble() * 2;
                pts[i][2] = -1 + rnd.NextDouble() * 2;

                res[i] = new double[ny];
            }

            Console.WriteLine( "Data initialized, {0:N} points", N );

            var mem_0 = GC.GetTotalMemory( true );
            var s = Stopwatch.StartNew();

            alglib.rbfreport rep;
            alglib.rbfsetpoints( model, data );
            alglib.rbfbuildmodel( model, out rep );

            Console.WriteLine( "Model build, time = {0:N} ms", s.ElapsedMilliseconds );
            GC.Collect();
            var mem_1 = GC.GetTotalMemory( true );
            Console.WriteLine( "Allocated {0:N} bytes", mem_1 - mem_0 );
           
            s = Stopwatch.StartNew();
            for ( int j = 0; j < N; ++j ) {
                alglib.rbfcalc( model, pts[j], out res[j] );
            }
            Console.WriteLine( "Computation done! time = {0:N} ms", s.ElapsedMilliseconds );

            Console.ReadLine();


As far as i undersand 4GB ram should be sufficient to create model from 400 000 points? Or not?

Author:  Sergey.Bochkanov [ Mon Nov 19, 2012 4:54 pm ]
Post subject:  Re: PLINQ/Task vs Rbf

Hello! I have not had enough time to solve this issue today, but I will investigate it tomorrow and report results to this topic.

Author:  Sergey.Bochkanov [ Tue Nov 20, 2012 7:11 am ]
Post subject:  Re: PLINQ/Task vs Rbf

The problem is that algorithm needs a lot of memory for its internal calculations. You have 400.000 points, each have many neighbors whose influence must be accounted for. In your setting it allocates double[] array with 500.000.000 elements to store weights matrix. And NET framework has upper limit on array size (2GB), so you can not allocate such large array under .NET even when you compile for 64-bit architecture.

Page 1 of 2 All times are UTC
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/