forum.alglib.net http://forum.alglib.net/ 

Highdimensional QP problem yields unreliable results http://forum.alglib.net/viewtopic.php?f=2&t=3843 
Page 1 of 1 
Author:  zovnat [ Wed Mar 07, 2018 9:46 am ] 
Post subject:  Highdimensional QP problem yields unreliable results 
Hi, Numeric instability of problem? Double precision insufficient? Does AlgLib have a quadruple precision version? An SVM problem with 770 training points yields a 770dimensional QP problem. Using DenseAU, with a variety of scaling factors  the problem converges  yet the solution to the QP problem doesn't provide a good solution to the SVM one. In addition, when reformulating the problem only with the training points for which the Lagarangian multipliers were nonzero (a great deal of them are, as expected, zero  the nonzero ones are points on the boundaries of two hyperplanes that separate the "1"s from the "1" ones)  the solution comes out completely different... On the other hand, when taking a subset of the training points (every other point from a continuum of points  385 points)  the QP problem converges and the subproblem (with the points for which the Lagrangian multipliers were nonzero) provides nearly the exact same solution, and the subSVM problem is solved exactly  and the subsolution nearly resolves the full SVM problem. Numeric instability of problem? Double precision insufficient? Does AlgLib have a quadruple precision version? Any suggestions would be appreciated... Thanks, Zev 
Author:  Sergey.Bochkanov [ Wed Mar 07, 2018 10:29 am ] 
Post subject:  Re: Highdimensional QP problem yields unreliable results 
Can you send your code to sergey.bochkanov at alglib dot net ? I will try to find out, whether it is problem of the solver  or something else. 
Author:  zovnat [ Tue Mar 13, 2018 11:33 am ] 
Post subject:  Re: Highdimensional QP problem yields unreliable results 
Hi, Providing the data is impossible with the 60K limit and no attachments... The AlgLib part of my application does nothing different than the demo code  see below. 'ir_svm' just creates the (symmetric) ND Hessian and (N+1)D constraints matrices etc. The constraints are: all Xi > 0 and sigma(Xi * Yi, i=1..N) = 0, where N, the number of training points, was at most 770  but might need to be larger, Xi, i=1..N, are the LaGarange multipliers  and the QP unknowns, where Yi, i=1..N, are either 1 or 1 according to Ti being a 'true' or 'false' training point. Scales were set uniformly for all N  as 0.1, 1, 10, 100, 1000  with no appreciable effect. No initial guess for Xi is available  so initialized with ones. The QP having converged successfully, W, defining the the Mdimensional (58) hyperplane separating the 'true' training points from the 'false' ones can be calculated  and a scalar b such that (1) <w . Ti>  b should be >= 1 for 'true' Mdimensional Training points and <w . Ti>  b should be <= 1 for 'false' Training points; <w . Ti> here is the inner product between the two Mdimensional vectors. The distance between the two sets of training points is 2 / W. The result of the QP yields small but nonzero distance between the two sets  but the conditions (1) don't hold correctly  the training points for which the Lagrange multiplies came out identically zero full fill  not (1)  rather, in most cases, a weaker version (2) <w . Ti>  b should be >= f1 for 'true' Training points and <w . Ti>  b should be <= f2 for 'false' Training points; where 0< f1 & f2 < 1  and in isolated cases not even (2) holds... int _tmain(int argc, _TCHAR* argv[]) { try { double *pa, *pb, *pc, *ps, *px; int *pct; int N = ir_svm(&pa, &pb, &pc, &pct, &ps, &px); real_2d_array a; a.setcontent(N, N, pa); real_1d_array b; b.setcontent(N, pb); real_1d_array s; s.setcontent(N, ps); real_2d_array c; c.setcontent(N + 1, N + 1, pc); integer_1d_array ct; ct.setcontent(N + 1, pct); real_1d_array x; x.setcontent(N, px); minqpstate state; minqpreport rep; // create solver, set quadratic/linear terms minqpcreate(N, state); minqpsetquadraticterm(state, a); minqpsetlinearterm(state, b); minqpsetlc(state, c, ct); // NOTE: for convex problems you may try using minqpsetscaleautodiag() // which automatically determines variable scales. minqpsetscale(state, s); // // Solve problem with BLEICbased QP solver. // // This solver is intended for problems with moderate (up to 50) number // of general linear constraints and unlimited number of box constraints. // // Default stopping criteria are used. // //minqpsetalgobleic(state, 0.0, 0.0, 0.0, 0); minqpsetalgodenseaul(state, 1.0e15, 5.0e+4, 10); minqpoptimize(state); minqpresults(state, x, rep); printf("inneriterationscount %u\n" "ncholesky %u, nmv %u\n" "outeriterationscount %u\n" "terminationtype %d\n" , rep.inneriterationscount, rep.ncholesky, rep.nmv, rep.outeriterationscount, rep.terminationtype); } catch(ap_error) { printf("Exception!\n"); getchar(); } return 0; } 
Author:  zovnat [ Wed Mar 14, 2018 10:32 am ] 
Post subject:  Re: Highdimensional QP problem yields unreliable results 
Hi, Here is an SVM problem with 8 training points  4 'true' and 4 'false' that gets formulated as the following QP problem: Hessian (A) 0 1 2 3 4 5 6 7 0 9008.04 8692.77 6441.76 5085.08 6532.93 6773.31 7054.06 7549.81 1 8692.77 8406.34 6238.24 4929.02 6087.81 6389.49 6718.46 7243.83 2 6441.76 6238.24 4639.70 3671.38 4409.45 4668.51 4938.08 5348.06 3 5085.08 4929.02 3671.38 2918.12 3363.90 3608.07 3826.17 4142.27 4 6532.93 6087.81 4409.45 3363.90 8897.43 7848.32 7156.12 6844.05 5 6773.31 6389.49 4668.51 3608.07 7848.32 7433.34 7016.67 6873.04 6 7054.06 6718.46 4938.08 3826.17 7156.12 7016.67 6897.34 7022.35 7 7549.81 7243.83 5348.06 4142.27 6844.05 6873.04 7022.35 7418.02 Constraints (C) + (CT) alpha0 alpha1 alpha2 alpha3 alpha4 alpha5 alpha6 alpha7 right side type 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 2 0 0 1 0 0 0 0 0 0 1 3 0 0 0 1 0 0 0 0 0 1 4 0 0 0 0 1 0 0 0 0 1 5 0 0 0 0 0 1 0 0 0 1 6 0 0 0 0 0 0 1 0 0 1 7 0 0 0 0 0 0 0 1 0 1 8 1 1 1 1 1 1 1 1 0 0 For such a small problem one might expect to get a more exact SVM solution than: <Yi> <W . Ti>  b SVM Decision 0 1 1.123992 1 1 1 1.200366 1 2 1 1.123993 1 3 1 1.183616 1 4 1 1.579777 1 5 1 1.197507 1 6 1 0.978675 1 7 1 0.876008 1 where, as described in the previous post, W & b define the separating hyperplane. It is expected that <W.Ti>  b should have exact 1 and 1 results for some of the training points (those defining the region boundaries  those for which the LaGrange multipliers came out non zero). Not only does this not occur with reasonable accuracy  but for the 'false' points we have values greater than 1... This result persists even with hundreds of iterations and various scalings  here globally 1.e4 . Does this QP problem look so inherently unstable as to yield such disappointing results? Increasing the number of training points and things get much worse  complete rubbish for 685 points (although the QP problem converges)... Zev PS. your patience and interest in this issue is greatly appreciated  thanks! 
Author:  Sergey.Bochkanov [ Thu Mar 15, 2018 10:41 am ] 
Post subject:  Re: Highdimensional QP problem yields unreliable results 
Hi! Thank you for providing such short and easy to debug example! I've started investigating the problem, will report as soon as first results are available! 
Author:  Sergey.Bochkanov [ Thu Mar 15, 2018 10:48 am ] 
Post subject:  Re: Highdimensional QP problem yields unreliable results 
BTW, can you report also exact values of dataset points? There are only 8 ones, it should be easy to post. If dimension count is too large, you can send data file to sergey.bochkanov at alglib dot net 
Author:  zovnat [ Thu Mar 15, 2018 11:38 am ] 
Post subject:  Re: Highdimensional QP problem yields unreliable results 
Hi, I found a mathematical bug in my implementation. Now, I get exact 1/1 <W.Ti>  b values for the support vectors for the 8D problem... But for a 600D+ problem  not... I will increase dimensionality and see for which N things go wrong. Thanks for your help & patience. Zev 
Author:  Sergey.Bochkanov [ Thu Mar 15, 2018 12:23 pm ] 
Post subject:  Re: Highdimensional QP problem yields unreliable results 
Hi! I noticed that you specified nonnegativity constraints as general linear ones, with minqpsetlc() call. Thus, you have N+1 general linear constraint. However, it is suboptimal from both performance and precision points of view. Box constraints are handled in a special way, much faster and precisely than general linear ones. I recommend you to split your constraint into box constrained part, set by minqpsetbc() call, and general linear one  just one linear constraint set with minqpsetlc() function. You will get much better performance and precision. Another suggestion is to set nonzero lambda coefficient for the SVM problem. 
Author:  zovnat [ Thu Mar 15, 2018 1:03 pm ] 
Post subject:  Re: Highdimensional QP problem yields unreliable results 
Hi, Thanks for the suggestions.  How can I use box constraints for the N "Xi >= 0"? I'll have to guess what upper boundaries to specify...  I'm not familiar with the "nonzero lambda coefficient"  I'll look into it... BTW, with my current formulation, the <W.Ti>  b holds with great precision for the support vectors up to N=150; for N=186 I get the expected +1/1 only with a precision of 2 decimal places  and thereafter things diverge... Zev 
Author:  Sergey.Bochkanov [ Fri Mar 16, 2018 2:37 pm ] 
Post subject:  Re: Highdimensional QP problem yields unreliable results 
You can specify +INF as upper bound, ALGLIB will correctly handle signed infinities in the boundary values. Or just 1.0e20, if you want so. 
Page 1 of 1  All times are UTC 
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ 