BLEIC performance in v3.7.0 vs v3.6.0

alusr1 · **Joined:** Tue Jul 06, 2010 8:00 pm **Posts:** 21

I am running an identical BLEIC optimization problem and noticed that the performance in AlgLib v3.7.0 is worse than that in v3.6.0. Specifically, the execution time scales much worse with the number of variables. My optimization problem runs in two stages:

#1: 108 variables, 96 data points
v3.6.0: iterations = 402, execution time = 157 to 158 ms
v3.7.0: iterations = 132, execution time = 793 to 795 ms

#2: 10 variables, 63 data points (selected from a larger set for each step), 96 steps
v3.6.0: max iterations = 428, avg iterations = 61, total execution time = 27 to 28 ms
v3.7.0: max iterations = 60, avg iterations = 22, total execution time = 34 to 35 ms

So, the increase in execution time is more pronounced with a large number of variables (approx. 5x for 108 variables) than with a small number of variables (approx. 26% for 10 variables). Hence the scaling problem, even though the number of iterations in v3.7.0 is smaller than in v.3.6.0.

This is a steep price to pay for a better handling of degenerate constraints. Comments?

Sergey.Bochkanov · **Joined:** Fri May 07, 2010 7:06 am **Posts:** 930

How many linear (non-boundary) constraints do you have? Can you post your code here, so I will be able to study your case and use it for benchmarking?

The problem with old (3.6.0) algorithm is that sometimes it was unable to correctly deactivate constraints. And it was almost impossible to repair it - I personally spent several weeks fixing different flaws in the logic - just to found more flaws introduced by previous fixes. New algorithm suggested by Elvira Illarionova is much more stable. However, I think that its performance can be significantly improved, and it will be top priority issue.

Sergey.Bochkanov · **Joined:** Fri May 07, 2010 7:06 am **Posts:** 930

I think that this performance drop comes from new constraint activation/deactivation code. However, it would be good to hear about constraints count in your problems (#1 and #2) - and even more good to study actual code.

alusr1 · **Joined:** Tue Jul 06, 2010 8:00 pm **Posts:** 21

In each optimization stage, the number of constraints = the number of variables (N) + one:
Each of these variables must be >= 0, and their sum must be = 1.
For simplicity, I start each optimization stage with each x = 1/N (equal fractional values).

I cannot post the actual data or the objective function here, but the shape of this function is smooth N-dimensional quadratic-like, so there is no problem with finding a global minimum (and no, I cannot use a straight quadratic optimization).
The problem is that I have to repeat both optimization stages tens of thousands of times, so a 5x(+) performance penalty for a large number of variables in v3.7.0 is really painful.

BTW, with AlgLib v3.6.0, I experimented with CUDA to calculate the gradient of my objective function in parallel mode on the GPU, while the main CPU runs the BLEIC algorithm. My machine has one of the most powerful Intel CPUs (3820) and NVIDIA GPUs (GTX 680), both with plenty of memory for the size of the problem. It turned out that the CUDA-augmented computation was actually slower than the pure CPU implementation. A person with more experience in CUDA explained that this was a result of the gradient computation being memory-bound (i.e. CUDA works best when the number of memory accesses is minimized). I will have to try CUDA with AlgLib 3.7.0 to see if the smaller number of iterations results in a different relative performance vs. the CPU-only implementation (presumably, with fewer iterations, fewer gradient computations are required).

Sergey.Bochkanov · **Joined:** Fri May 07, 2010 7:06 am **Posts:** 930

Did you specified all these N+1 constraints with one call of minbleicsetlc? Or you used combination of setbc and setlc?

General linear constraints are hard to process - much harder than boundary ones. Old algorithm was replaced because of problems with handling of general constraints. And you have N boundary constraints - and only one general linear constraint. So it is quite natural to specify these N constraints as boundary ones, and save a lot of time.

I've tried to reproduce your issue with some simple test function. With N boundary constraints specified by setbc and one constrain specified with setlc algorithm was done in less than 10ms. The problem was not reproduced. However, with N+1 general linear constraint it took about 2000ms to converge to the same solution, which is similar to your results.

alusr1 · **Joined:** Tue Jul 06, 2010 8:00 pm **Posts:** 21

Thanks for looking into this problem. I use this call with both v3.6.0 and v3.7.0: minbleicsetlc(state, c, ct). I guess it sets N+1 linear constraints, which, as you indicated, negatively affects performance. Let me change it to a combination of N boundary + 1 linear constraints, I will test this approach and report results soon.

alusr1 · **Joined:** Tue Jul 06, 2010 8:00 pm **Posts:** 21

After changing the N linear constraints to boundary constraints and retaining one linear constraint (as discussed above), I got the following for my two optimization stages:

#1: 108 variables, 96 data points
v3.7.0: iterations = 131, execution time = 10 ms (was 794 ms on average)

#2: 10 variables, 63 data points (selected from a larger set for each step), 96 steps
v3.7.0: max iterations = 60, avg iterations = 23, total execution time = 23 to 24 ms (was 34.5 ms on average)

So, the performance of stage #1 improved by a staggering factor of about 80x, and stage #2 by a still significant 1.47x.
I did not go back to test the same improvement with v3.6.0, but could do so if there is a sufficient interest from those who follow this thread.

Sergey.Bochkanov · **Joined:** Fri May 07, 2010 7:06 am **Posts:** 930

I think that 3.6.0 will show similar characteristics, may be a bit different numbers, but very similar to 3.7.0, so I think you don't have to spend your time just to get almost same result.

BTW, it is likely that 3.6.0 will be a bit faster, because it uses single-step constraint activation/deactivation algorithm where 3.7.0 uses multi-step one. A bit faster, but not 5x time faster, as it was with N non-boundary constraints.

alusr1 · **Joined:** Tue Jul 06, 2010 8:00 pm **Posts:** 21

Thanks, Sergey. One more question: what is the best way to convert from a one-dimensional array of doubles to real_1d_array that BLEIC is using?

Here is how I currently do it:

double *x; // this is an argument to my optimization function
int nx; // number of independent variables in x
...
alglib_impl::ae_vector xvec;
xvec.cnt = nx;
xvec.datatype = alglib_impl::DT_REAL;
xvec.data.ptr = NULL;
xvec.ptr.p_double = x;
real_1d_array xa(&xvec);
...
// BLEIC can now use xa

I also noticed that the last line can be replaced with this one without any apparent side effects:
real_1d_array xa = &xvec;

alusr1 · **Joined:** Tue Jul 06, 2010 8:00 pm **Posts:** 21

Given the EpsG problem (see the separate thread "BLEIC stopping criteria in v3.7.0 vs v3.6.0"), I went back to v3.6.0 and changed my optimization to use N boundary + 1 linear constraints. Here are the results:

#1: 108 variables, 96 data points
v3.6.0 with N+1 linear constraints: iterations = 402, execution time = 157 to 158 ms
v3.6.0 with N boundary + 1 linear constraints: iterations = 396, execution time = 5 ms

#2: 10 variables, 63 data points (selected from a larger set for each step), 96 steps
v3.6.0 with N+1 linear constraints: max iterations = 428, avg iterations = 61, total execution time = 27 to 28 ms
v3.6.0 with N boundary + 1 linear constraints: max iterations = 400, avg iterations = 59, total execution time = 19 ms

So, performance improvement in stage #1 (a relatively large number of independent variables, but one step) was about 32x.
Improvement in stage #2 (a smaller number of independent variables, but a large number of steps) was about 1.45x.

Until the EpsG problem in v3.7.0 is addressed, I will have to stick with v3.6.0 -- it is faster, too.

forum.alglib.net

Forum rules

BLEIC performance in v3.7.0 vs v3.6.0

Who is online