forum.alglib.net :: View topic - kmeansgenerate clarification

forum.alglib.net http://forum.alglib.net/

kmeansgenerate clarification http://forum.alglib.net/viewtopic.php?f=2&t=122	Page 1 of 1

Author:	g40 [ Mon Nov 15, 2010 7:29 pm ]
Post subject:	kmeansgenerate clarification
Can I make sure I've understood the various parameters for kmeansgenerate() correctly? 1. The final output parameter is a 1D array 'xyc'. Does this indicate which row of data has ended up in which cluster? i.e given xyc ends up with N rows and we've 3 clusters: row0 [cluster index 0..2] row1 [cluster index 0..2] ... rowN-1 [cluster index 0..2] 2. The real_2d_array 'c' is described as 'array[0..NVars-1,0..K-1].matrix whose columns store cluster's centers'. Can I use this data to calculate the cluster's Within Sum of Squares (WSS) ? Or is it already present. My initial data source is a SQL server database. Pseudo-code is as follows: Many thanks Jerry Code: // SQL data source SQLDS ds; // kmeansgenerate() data array alglib::real_2d_array arr; // for each row in ds and for each column in ds arr[row][col] = ds[row][col] // set upkmeans++ int k = 3; // 3 clusters int iterations = 10; // is this iterations or retries? alglib::ae_int_t info = 0; alglib::real_2d_array c; alglib::integer_1d_array xyc; // run kmeans alglib::kmeansgenerate(arr,rows,cols,k,iterations,info,c,xyc); // check clustering

Author:	Sergey.Bochkanov [ Mon Nov 15, 2010 8:27 pm ]
Post subject:	Re: kmeansgenerate clarification
1. Yes, it stores cluster indices (from 0 to K-1), with clusters themselves stored in C. XYC is guaranteed to be consistent with XY and C except for situations where it is hard to decide what cluster point belongs to (i.e. there are several clusters at equal distance from point in questions). In such situations one of the clusters is chosen at random (factors which influence choice: order of appearance, numerical errors during calculation of distances). 2. No WSS is calculated by ALGLIB, you should calculate it yourself. 3. "iterations" are actually "restarts" (this parameter is called "restarts") - number of attempts to find better clustering with different starting distributions.

Author:	g40 [ Mon Nov 15, 2010 8:58 pm ]
Post subject:	Re: kmeansgenerate clarification
Sergey. Thanks for the excellent feedback. Is there a good way to suggest new alglib features?

Author:	Sergey.Bochkanov [ Tue Nov 16, 2010 6:01 am ]
Post subject:	Re: kmeansgenerate clarification
A lot of ways :) this forum, e-mail, issues tracker at bugs.alglib.net (it is used to track both bugs and features).

Author:	Reef [ Thu Mar 10, 2011 8:15 am ]
Post subject:	Re: kmeansgenerate clarification
How to know how many iterations i need to use for proper generation? Does algo stop itself when cluster centers dont change or change too little?

Author:	Sergey.Bochkanov [ Thu Mar 10, 2011 9:15 am ]
Post subject:	Re: kmeansgenerate clarification
This algorithm has two nested loops: 1. inner loop starts from random arrangement of clusters, tries to improve it, stops when nothing changes 2. outer loop moves to another random arrangement of clusters, runs inner loop and compares its results with best clustering found so far You can't control number of iterations in the inner loop - the only thing you can do is to choose number of outer iterations. If you are pretty sure that your problem is simple, you can live with one outer iteration. You can try 5, 10 or larger numbers and see how it changes quality of clustering. But everything is problem dependent.

Author:	MikeS [ Wed Oct 29, 2014 12:22 pm ]
Post subject:	Re: kmeansgenerate clarification
Hello Sergey, I would like to use the procedure KMeansGenerate. But I don't sure that I understood the description of input parameters of this procedure. I have the symmetric square matrix A, N=22 is size, each element Aij is a distance between two objects, and diagonal elements are Aij=0. I have specified input parameters like that: Code: N:=22; K := 5; // desired number of clusters, K>=1 NPoints := N; // dataset size, NPoints>=K NVars := N; // number of variables, NVars>=1 Is it OK? In what case "NPoints" is not equal "NVars"? Also I would like to know can I obtain the specific solution for my dataset? Random functions are used in the code: Code: I := RandomInteger(NPoints); and Code: V := RandomReal; I have using my dataset but results are different. I set: Code: Restarts := 3; // number of restarts, Restarts>=1 Thanks.

Author:	Sergey.Bochkanov [ Wed Oct 29, 2014 1:28 pm ]
Post subject:	Re: kmeansgenerate clarification
Hello! You can not use k-means on dataset specified by distance matrix. k-means works only with (a), explicitly given datasets, and (b) Euclidean distance. Because it is k-MEANS, it needs specific points which can be averaged. And its stability/convergence is guaranteed only for Euclidean metric.

Author:	dieting [ Mon Dec 29, 2014 5:31 am ]
Post subject:	Re: kmeansgenerate clarification
If you are pretty sure that your problem is simple, you can live with one outer iteration. You can try 5, 10 or larger numbers and see how it changes quality of clustering. But everything is problem dependent.???

Page 1 of 1	All times are UTC
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/