forum.alglib.net

ALGLIB forum
It is currently Sun Dec 16, 2018 2:08 am

All times are UTC


Forum rules


1. This forum can be used for discussion of both ALGLIB-related and general numerical analysis questions
2. This forum is English-only - postings in other languages will be removed.



Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: High-dimensional QP problem yields unreliable results
PostPosted: Wed Mar 07, 2018 9:46 am 
Offline

Joined: Thu Jan 25, 2018 7:38 am
Posts: 11
Hi,

Numeric instability of problem? Double precision insufficient? Does AlgLib have a quadruple precision version?

An SVM problem with 770 training points yields a 770-dimensional QP problem.

Using DenseAU, with a variety of scaling factors - the problem converges - yet the solution to the QP problem doesn't provide a good solution to the SVM one. In addition, when reformulating the problem only with the training points for which the Lagarangian multipliers were non-zero (a great deal of them are, as expected, zero - the non-zero ones are points on the boundaries of two hyper-planes that separate the "1"s from the "-1" ones) - the solution comes out completely different...

On the other hand, when taking a subset of the training points (every other point from a continuum of points - 385 points) - the QP problem converges and the sub-problem (with the points for which the Lagrangian multipliers were non-zero) provides nearly the exact same solution, and the sub-SVM problem is solved exactly - and the sub-solution nearly resolves the full SVM problem.

Numeric instability of problem? Double precision insufficient? Does AlgLib have a quadruple precision version?
Any suggestions would be appreciated...

Thanks,

Zev


Top
 Profile  
 
 Post subject: Re: High-dimensional QP problem yields unreliable results
PostPosted: Wed Mar 07, 2018 10:29 am 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 835
Can you send your code to sergey.bochkanov at alglib dot net ?

I will try to find out, whether it is problem of the solver - or something else.


Top
 Profile  
 
 Post subject: Re: High-dimensional QP problem yields unreliable results
PostPosted: Tue Mar 13, 2018 11:33 am 
Offline

Joined: Thu Jan 25, 2018 7:38 am
Posts: 11
Hi,

Providing the data is impossible with the 60K limit and no attachments...

The AlgLib part of my application does nothing different than the demo code - see below.
'ir_svm' just creates the (symmetric) N-D Hessian and (N+1)-D constraints matrices etc.

The constraints are: all Xi > 0 and sigma(Xi * Yi, i=1..N) = 0,
where N, the number of training points, was at most 770 - but might need to be larger,
Xi, i=1..N, are the LaGarange multipliers - and the QP unknowns,
where Yi, i=1..N, are either 1 or -1 according to Ti being a 'true' or 'false' training point.
Scales were set uniformly for all N - as 0.1, 1, 10, 100, 1000 - with no appreciable effect.
No initial guess for Xi is available - so initialized with ones.

The QP having converged successfully, W, defining the the M-dimensional (58) hyperplane separating the 'true' training points from the 'false' ones can be calculated - and a scalar b such that
(1)
<w . Ti> - b should be >= 1 for 'true' M-dimensional Training points and
<w . Ti> - b should be <= -1 for 'false' Training points;
<w . Ti> here is the inner product between the two M-dimensional vectors.

The distance between the two sets of training points is 2 / ||W||.

The result of the QP yields small but non-zero distance between the two sets -
but the conditions (1) don't hold correctly - the training points for which the Lagrange multiplies came out identically zero full fill - not (1) - rather, in most cases, a weaker version
(2)
<w . Ti> - b should be >= f1 for 'true' Training points and
<w . Ti> - b should be <= -f2 for 'false' Training points;
where 0< f1 & f2 < 1
- and in isolated cases not even (2) holds...


int _tmain(int argc, _TCHAR* argv[])
{
try
{
double *pa, *pb, *pc, *ps, *px;
int *pct;

int N = ir_svm(&pa, &pb, &pc, &pct, &ps, &px);

real_2d_array a;
a.setcontent(N, N, pa);

real_1d_array b;
b.setcontent(N, pb);

real_1d_array s;
s.setcontent(N, ps);

real_2d_array c;
c.setcontent(N + 1, N + 1, pc);

integer_1d_array ct;
ct.setcontent(N + 1, pct);

real_1d_array x;
x.setcontent(N, px);

minqpstate state;
minqpreport rep;

// create solver, set quadratic/linear terms
minqpcreate(N, state);
minqpsetquadraticterm(state, a);
minqpsetlinearterm(state, b);
minqpsetlc(state, c, ct);

// NOTE: for convex problems you may try using minqpsetscaleautodiag()
// which automatically determines variable scales.
minqpsetscale(state, s);

//
// Solve problem with BLEIC-based QP solver.
//
// This solver is intended for problems with moderate (up to 50) number
// of general linear constraints and unlimited number of box constraints.
//
// Default stopping criteria are used.
//

//minqpsetalgobleic(state, 0.0, 0.0, 0.0, 0);
minqpsetalgodenseaul(state, 1.0e-15, 5.0e+4, 10);

minqpoptimize(state);
minqpresults(state, x, rep);
printf("inneriterationscount %u\n"
"ncholesky %u, nmv %u\n"
"outeriterationscount %u\n"
"terminationtype %d\n"
, rep.inneriterationscount, rep.ncholesky, rep.nmv, rep.outeriterationscount, rep.terminationtype);
}
catch(ap_error)
{
printf("Exception!\n");
getchar();
}
return 0;
}


Top
 Profile  
 
 Post subject: Re: High-dimensional QP problem yields unreliable results
PostPosted: Wed Mar 14, 2018 10:32 am 
Offline

Joined: Thu Jan 25, 2018 7:38 am
Posts: 11
Hi,

Here is an SVM problem with 8 training points - 4 'true' and 4 'false'
that gets formulated as the following QP problem:
Hessian (A)
0 1 2 3 4 5 6 7
0 9008.04 8692.77 6441.76 5085.08 -6532.93 -6773.31 -7054.06 -7549.81
1 8692.77 8406.34 6238.24 4929.02 -6087.81 -6389.49 -6718.46 -7243.83
2 6441.76 6238.24 4639.70 3671.38 -4409.45 -4668.51 -4938.08 -5348.06
3 5085.08 4929.02 3671.38 2918.12 -3363.90 -3608.07 -3826.17 -4142.27
4 -6532.93 -6087.81 -4409.45 -3363.90 8897.43 7848.32 7156.12 6844.05
5 -6773.31 -6389.49 -4668.51 -3608.07 7848.32 7433.34 7016.67 6873.04
6 -7054.06 -6718.46 -4938.08 -3826.17 7156.12 7016.67 6897.34 7022.35
7 -7549.81 -7243.83 -5348.06 -4142.27 6844.05 6873.04 7022.35 7418.02

Constraints (C) + (CT)
alpha0 alpha1 alpha2 alpha3 alpha4 alpha5 alpha6 alpha7 right side type
0 1 0 0 0 0 0 0 0 0 1
1 0 1 0 0 0 0 0 0 0 1
2 0 0 1 0 0 0 0 0 0 1
3 0 0 0 1 0 0 0 0 0 1
4 0 0 0 0 1 0 0 0 0 1
5 0 0 0 0 0 1 0 0 0 1
6 0 0 0 0 0 0 1 0 0 1
7 0 0 0 0 0 0 0 1 0 1
8 1 1 1 1 -1 -1 -1 -1 0 0

For such a small problem one might expect to get a more exact SVM solution than:
<Yi> <W . Ti> - b SVM Decision
0 1 1.123992 1
1 1 1.200366 1
2 1 1.123993 1
3 1 1.183616 1
4 -1 -1.579777 -1
5 -1 -1.197507 -1
6 -1 -0.978675 -1
7 -1 -0.876008 -1

where, as described in the previous post, W & b define the separating hyperplane.
It is expected that <W.Ti> - b should have exact 1 and -1 results for some of the training points (those defining the region boundaries - those for which the LaGrange multipliers came out non zero).
Not only does this not occur with reasonable accuracy - but for the 'false' points we have values greater than -1...

This result persists even with hundreds of iterations and various scalings - here globally 1.e4 .

Does this QP problem look so inherently unstable as to yield such disappointing results?

Increasing the number of training points and things get much worse - complete rubbish for 685 points (although the QP problem converges)...

Zev

PS. your patience and interest in this issue is greatly appreciated - thanks!


Top
 Profile  
 
 Post subject: Re: High-dimensional QP problem yields unreliable results
PostPosted: Thu Mar 15, 2018 10:41 am 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 835
Hi!

Thank you for providing such short and easy to debug example! I've started investigating the problem, will report as soon as first results are available!


Top
 Profile  
 
 Post subject: Re: High-dimensional QP problem yields unreliable results
PostPosted: Thu Mar 15, 2018 10:48 am 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 835
BTW, can you report also exact values of dataset points? There are only 8 ones, it should be easy to post.

If dimension count is too large, you can send data file to sergey.bochkanov at alglib dot net


Top
 Profile  
 
 Post subject: Re: High-dimensional QP problem yields unreliable results
PostPosted: Thu Mar 15, 2018 11:38 am 
Offline

Joined: Thu Jan 25, 2018 7:38 am
Posts: 11
Hi,

I found a mathematical bug in my implementation.

Now, I get exact 1/-1 <W.Ti> - b values for the support vectors for the 8-D problem...
But for a 600-D+ problem - not...

I will increase dimensionality and see for which N things go wrong.

Thanks for your help & patience.

Zev


Top
 Profile  
 
 Post subject: Re: High-dimensional QP problem yields unreliable results
PostPosted: Thu Mar 15, 2018 12:23 pm 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 835
Hi!

I noticed that you specified non-negativity constraints as general linear ones, with minqpsetlc() call. Thus, you have N+1 general linear constraint. However, it is suboptimal from both performance and precision points of view. Box constraints are handled in a special way, much faster and precisely than general linear ones.

I recommend you to split your constraint into box constrained part, set by minqpsetbc() call, and general linear one - just one linear constraint set with minqpsetlc() function. You will get much better performance and precision.

Another suggestion is to set non-zero lambda coefficient for the SVM problem.


Top
 Profile  
 
 Post subject: Re: High-dimensional QP problem yields unreliable results
PostPosted: Thu Mar 15, 2018 1:03 pm 
Offline

Joined: Thu Jan 25, 2018 7:38 am
Posts: 11
Hi,

Thanks for the suggestions.
- How can I use box constraints for the N "Xi >= 0"? I'll have to guess what upper boundaries to specify...
- I'm not familiar with the "non-zero lambda coefficient" - I'll look into it...

BTW, with my current formulation, the <W.Ti> - b holds with great precision for the support vectors up to N=150; for N=186 I get the expected +1/-1 only with a precision of 2 decimal places - and thereafter things diverge...

Zev


Top
 Profile  
 
 Post subject: Re: High-dimensional QP problem yields unreliable results
PostPosted: Fri Mar 16, 2018 2:37 pm 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 835
You can specify +INF as upper bound, ALGLIB will correctly handle signed infinities in the boundary values. Or just 1.0e20, if you want so.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC


Who is online

Users browsing this forum: Bing [Bot] and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group