forum.alglib.net

ALGLIB forum
It is currently Sun Dec 22, 2024 7:40 pm

All times are UTC


Forum rules


1. This forum can be used for discussion of both ALGLIB-related and general numerical analysis questions
2. This forum is English-only - postings in other languages will be removed.



Post new topic Reply to topic  [ 9 posts ] 
Author Message
 Post subject: kmeansgenerate clarification
PostPosted: Mon Nov 15, 2010 7:29 pm 
Offline

Joined: Mon Nov 15, 2010 12:58 pm
Posts: 7
Can I make sure I've understood the various parameters for kmeansgenerate() correctly?

1. The final output parameter is a 1D array 'xyc'. Does this indicate which row of data has ended up in which cluster? i.e given xyc ends up with N rows and we've 3 clusters:

row0 [cluster index 0..2]
row1 [cluster index 0..2]
...
rowN-1 [cluster index 0..2]

2. The real_2d_array 'c' is described as 'array[0..NVars-1,0..K-1].matrix whose columns store cluster's centers'. Can I use this data to calculate the cluster's Within Sum of Squares (WSS) ? Or is it already present.

My initial data source is a SQL server database. Pseudo-code is as follows:

Many thanks

Jerry

Code:
// SQL data source
SQLDS ds;
// kmeansgenerate()  data array
alglib::real_2d_array arr;
// for each row in ds and for each column in ds
arr[row][col] = ds[row][col]
// set upkmeans++
int k = 3;                       // 3 clusters
int iterations = 10;          // is this iterations or retries?
alglib::ae_int_t info = 0;
alglib::real_2d_array c;
alglib::integer_1d_array xyc;
// run kmeans
alglib::kmeansgenerate(arr,rows,cols,k,iterations,info,c,xyc);
// check clustering


Top
 Profile  
 
 Post subject: Re: kmeansgenerate clarification
PostPosted: Mon Nov 15, 2010 8:27 pm 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 927
1. Yes, it stores cluster indices (from 0 to K-1), with clusters themselves stored in C. XYC is guaranteed to be consistent with XY and C except for situations where it is hard to decide what cluster point belongs to (i.e. there are several clusters at equal distance from point in questions). In such situations one of the clusters is chosen at random (factors which influence choice: order of appearance, numerical errors during calculation of distances).

2. No WSS is calculated by ALGLIB, you should calculate it yourself.

3. "iterations" are actually "restarts" (this parameter is called "restarts") - number of attempts to find better clustering with different starting distributions.


Top
 Profile  
 
 Post subject: Re: kmeansgenerate clarification
PostPosted: Mon Nov 15, 2010 8:58 pm 
Offline

Joined: Mon Nov 15, 2010 12:58 pm
Posts: 7
Sergey.

Thanks for the excellent feedback.

Is there a good way to suggest new alglib features?


Top
 Profile  
 
 Post subject: Re: kmeansgenerate clarification
PostPosted: Tue Nov 16, 2010 6:01 am 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 927
A lot of ways :) this forum, e-mail, issues tracker at bugs.alglib.net (it is used to track both bugs and features).


Top
 Profile  
 
 Post subject: Re: kmeansgenerate clarification
PostPosted: Thu Mar 10, 2011 8:15 am 
Offline

Joined: Thu Mar 10, 2011 8:11 am
Posts: 2
How to know how many iterations i need to use for proper generation?
Does algo stop itself when cluster centers dont change or change too little?


Top
 Profile  
 
 Post subject: Re: kmeansgenerate clarification
PostPosted: Thu Mar 10, 2011 9:15 am 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 927
This algorithm has two nested loops:
1. inner loop starts from random arrangement of clusters, tries to improve it, stops when nothing changes
2. outer loop moves to another random arrangement of clusters, runs inner loop and compares its results with best clustering found so far

You can't control number of iterations in the inner loop - the only thing you can do is to choose number of outer iterations. If you are pretty sure that your problem is simple, you can live with one outer iteration. You can try 5, 10 or larger numbers and see how it changes quality of clustering. But everything is problem dependent.


Top
 Profile  
 
 Post subject: Re: kmeansgenerate clarification
PostPosted: Wed Oct 29, 2014 12:22 pm 
Offline

Joined: Wed Oct 29, 2014 11:58 am
Posts: 1
Hello Sergey,

I would like to use the procedure KMeansGenerate.

But I don't sure that I understood the description of input parameters of this procedure.
I have the symmetric square matrix A, N=22 is size, each element Aij is a distance between two objects, and diagonal elements are Aij=0.
I have specified input parameters like that:

Code:
N:=22;
K := 5;       //   desired number of clusters, K>=1
NPoints := N;   //   dataset size, NPoints>=K
NVars   := N;      //   number of variables, NVars>=1



Is it OK? In what case "NPoints" is not equal "NVars"?

Also I would like to know can I obtain the specific solution for my dataset? Random functions are used in the code:
Code:
I := RandomInteger(NPoints);

and
Code:
V := RandomReal;


I have using my dataset but results are different.
I set:
Code:
Restarts := 3;   //   number of restarts, Restarts>=1


Thanks.


Top
 Profile  
 
 Post subject: Re: kmeansgenerate clarification
PostPosted: Wed Oct 29, 2014 1:28 pm 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 927
Hello!

You can not use k-means on dataset specified by distance matrix. k-means works only with (a), explicitly given datasets, and (b) Euclidean distance. Because it is k-MEANS, it needs specific points which can be averaged. And its stability/convergence is guaranteed only for Euclidean metric.


Top
 Profile  
 
 Post subject: Re: kmeansgenerate clarification
PostPosted: Mon Dec 29, 2014 5:31 am 
Offline

Joined: Mon Dec 29, 2014 2:30 am
Posts: 1
If you are pretty sure that your problem is simple, you can live with one outer iteration. You can try 5, 10 or larger numbers and see how it changes quality of clustering. But everything is problem dependent.???


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 38 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group