forum.alglib.net

ALGLIB forum
It is currently Sun Dec 22, 2024 11:41 pm

All times are UTC


Forum rules


1. This forum can be used for discussion of both ALGLIB-related and general numerical analysis questions
2. This forum is English-only - postings in other languages will be removed.



Post new topic Reply to topic  [ 3 posts ] 
Author Message
 Post subject: Seed PCA calculation with shared basis vectors
PostPosted: Sat Jul 01, 2017 7:51 pm 
Offline

Joined: Fri May 24, 2013 3:45 pm
Posts: 3
I have many similar datasets that I currently use PCA to reduce dimensionality on, storing the basis vectors and variance values for each of these data sets. I would like to be able to utilize the similarity of the datasets to reduce how much data I end up storing by running the PCA algorithm on the entire dataset, generating a small number of basis vectors that would be shared for all data sets.

My thought on how to implement this is as follows:
    Create a large matrix of all sample points
    Run truncated subspace PCA to generate N basis vectors
    No need to retain variance values; just the N basis vectors

For each smaller dataset:
    Create matrix of small dataset sample points (same number of rows as original)
    Project dataset onto original basis vectors, reducing the dimensionality somewhat. (How could I calculate the variance values from this?)
    Run PCA algorithm on dataset to generate additional basis vectors and variance values.
    Store the additional basis vectors and the full set of variance values for each smaller dataset.

Does this seem like a reasonable way to approach the problem? I think that the subset basis vectors will be orthogonal to the shared basis vectors, but this is not a requirement for my application. Thanks!


Top
 Profile  
 
 Post subject: Re: Seed PCA calculation with shared basis vectors
PostPosted: Wed Jul 05, 2017 1:44 pm 
Offline
Site Admin

Joined: Fri May 07, 2010 7:06 am
Posts: 927
Hi!

The general approach makes sense, I just want to clarify a few details. (Linear algebra for such problems is often confusing - sometimes you have to work with orthogonal basis, sometimes with its orthogonal complement.)

First part of your proposal is completely right. You select small number N, small fraction of your dataset dimensionality, and perform truncated PCA (just N vectors instead of dim(dataset)) on large combined dataset.

Second part is a bit tricky. Before you start with PCA you should project your smaller dataset onto shared basis and subtract this projection from dataset. After such transformation, perform ever one more truncated PCA to get M "subset vectors". As result, your subset basis will be orthogonal to the shared basis (projection-subtraction step enforces this property) - and I think that it is quite important property.


Top
 Profile  
 
 Post subject: Re: Seed PCA calculation with shared basis vectors
PostPosted: Wed Jul 05, 2017 8:34 pm 
Offline

Joined: Fri May 24, 2013 3:45 pm
Posts: 3
Thank you, I think that is exactly what I needed to know. Thanks!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 44 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group