forum.alglib.net

ALGLIB forum
It is currently Fri Sep 20, 2024 7:36 pm

All times are UTC


Forum rules


1. This forum can be used for discussion of both ALGLIB-related and general numerical analysis questions
2. This forum is English-only - postings in other languages will be removed.



Post new topic Reply to topic  [ 2 posts ] 
Author Message
 Post subject: Calculating p-values for ALGLIB's linear model?
PostPosted: Tue Jul 04, 2023 5:25 am 
Offline

Joined: Thu Aug 23, 2018 5:09 am
Posts: 5
I've been using the Linear Regression functionality in dataanalysis.cs (linearmodel) to perform multiple linear regression on my data. I've been able to obtain the coefficients and R2 for the output model, but I can't figure out how to calculate the p-values or t-statistics for said model's coefficients.

The outputs I'm aware of are those that can be unpacked with lrunpack():

Quote:
/*************************************************************************
Unpacks coefficients of linear model.

INPUT PARAMETERS:
LM - linear model in ALGLIB format

OUTPUT PARAMETERS:
V - coefficients, array[0..NVars]
constant term (intercept) is stored in the V[NVars].
NVars - number of independent variables (one less than number
of coefficients)

-- ALGLIB --
Copyright 30.08.2008 by Bochkanov Sergey
*************************************************************************/


...as well as those stored in LRReport:

Quote:
/*************************************************************************
LRReport structure contains additional information about linear model:
* C - covariation matrix, array[0..NVars,0..NVars].
C[i,j] = Cov(A[i],A[j])
* RMSError - root mean square error on a training set
* AvgError - average error on a training set
* AvgRelError - average relative error on a training set (excluding
observations with zero function value).
* CVRMSError - leave-one-out cross-validation estimate of
generalization error. Calculated using fast algorithm
with O(NVars*NPoints) complexity.
* CVAvgError - cross-validation estimate of average error
* CVAvgRelError - cross-validation estimate of average relative error

All other fields of the structure are intended for internal use and should
not be used outside ALGLIB.
*************************************************************************/


But I don't see which of these outputs could relate to t-statistics or p-values, if any. Could anyone help me understand if ALGLIB can output either of these values, and if not, could anyone explain how I might go about calculating these values for myself using the provided information?


Top
 Profile  
 
 Post subject: Re: Calculating p-values for ALGLIB's linear model?
PostPosted: Thu Jul 20, 2023 1:53 am 
Offline

Joined: Thu Aug 23, 2018 5:09 am
Posts: 5
So I ended up figuring this out with the help of a friend.

The covariation matrix in LRReport (C) is used to calculate the Standard Error of each of the coefficients. Getting the square root of the values at [0,0], [1,1], ... [n,n] of the matrix will give you the standard errors of each coefficient. The t-statistic of each variable can then be calculated by dividing each coefficient (subtracted by the null hypothesis of the coefficient, which in most cases is 0) by its respective standard error. We can then use the StudentTDistribution function (https://www.alglib.net/specialfunctions/distributions/student.php) to get the integral of the t distribution. This also required the degrees of freedom, which is the number of samples subtracted by the number of variables. The result of this function can be easily modified into either a one-tailed or two-tailed p-value.

Here's a slapped together C# codeblock demonstrating how to accomplish this, in case anyone else (like me) has issues understanding the process. My project requires it be output to lists, though it could very easily be modified to output arrays instead.

Code:
   
public void GetStatisticsOfCovariationMatrix(double[] coefs, double[,] covariationMatrix, int numSamples, int numVariables, out List<double> standardErrors, out List<double> tStatistics, out List<double> pValues)
    {
        standardErrors = new List<double>();
        for (int i = 0; i <= covariationMatrix.GetUpperBound(0); i++)
        {
            standardErrors.Add(Math.Sqrt(covariationMatrix[i, i]));
        }

        double nullHypothesisCoefficient = 0; //This may need to be changed depending on what your null hypothesis is
       
        tStatistics = new List<double>();
        for (int i = 0; i < standardErrors.Count; i++)
        {
            tStatistics.Add((coefs[i] - nullHypothesisCoefficient) / standardErrors[i]);
        }

        pValues = new List<double>();
        for (int i = 0; i < tStatistics.Count; i++)
        {
            double tIntegral = alglib.studenttdistribution(numSamples - numVariables, Math.Abs(tStatistics[i]));
            //The p value for one tail of the t-distribution
            double p1 = 1 - tIntegral;
            //The p value for both tails of the t-distribution
            double p2 = p1 * 2;
            pValues.Add(p2);
        }
    }


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 2 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group