Calculating p-values for ALGLIB's linear model?

Rammiloh · **Joined:** Thu Aug 23, 2018 5:09 am **Posts:** 5

I've been using the Linear Regression functionality in dataanalysis.cs (linearmodel) to perform multiple linear regression on my data. I've been able to obtain the coefficients and R2 for the output model, but I can't figure out how to calculate the p-values or t-statistics for said model's coefficients.

The outputs I'm aware of are those that can be unpacked with lrunpack():

Quote:

/*************************************************************************
Unpacks coefficients of linear model.

INPUT PARAMETERS:
LM - linear model in ALGLIB format

OUTPUT PARAMETERS:
V - coefficients, array[0..NVars]
constant term (intercept) is stored in the V[NVars].
NVars - number of independent variables (one less than number
of coefficients)

-- ALGLIB --
Copyright 30.08.2008 by Bochkanov Sergey
*************************************************************************/

...as well as those stored in LRReport:

Quote:

/*************************************************************************
LRReport structure contains additional information about linear model:
* C - covariation matrix, array[0..NVars,0..NVars].
C[i,j] = Cov(A[i],A[j])
* RMSError - root mean square error on a training set
* AvgError - average error on a training set
* AvgRelError - average relative error on a training set (excluding
observations with zero function value).
* CVRMSError - leave-one-out cross-validation estimate of
generalization error. Calculated using fast algorithm
with O(NVars*NPoints) complexity.
* CVAvgError - cross-validation estimate of average error
* CVAvgRelError - cross-validation estimate of average relative error

All other fields of the structure are intended for internal use and should
not be used outside ALGLIB.
*************************************************************************/

But I don't see which of these outputs could relate to t-statistics or p-values, if any. Could anyone help me understand if ALGLIB can output either of these values, and if not, could anyone explain how I might go about calculating these values for myself using the provided information?

Rammiloh · **Joined:** Thu Aug 23, 2018 5:09 am **Posts:** 5

So I ended up figuring this out with the help of a friend.

The covariation matrix in LRReport (C) is used to calculate the Standard Error of each of the coefficients. Getting the square root of the values at [0,0], [1,1], ... [n,n] of the matrix will give you the standard errors of each coefficient. The t-statistic of each variable can then be calculated by dividing each coefficient (subtracted by the null hypothesis of the coefficient, which in most cases is 0) by its respective standard error. We can then use the StudentTDistribution function (https://www.alglib.net/specialfunctions/distributions/student.php) to get the integral of the t distribution. This also required the degrees of freedom, which is the number of samples subtracted by the number of variables. The result of this function can be easily modified into either a one-tailed or two-tailed p-value.

Here's a slapped together C# codeblock demonstrating how to accomplish this, in case anyone else (like me) has issues understanding the process. My project requires it be output to lists, though it could very easily be modified to output arrays instead.

Code:

    
public void GetStatisticsOfCovariationMatrix(double[] coefs, double[,] covariationMatrix, int numSamples, int numVariables, out List<double> standardErrors, out List<double> tStatistics, out List<double> pValues)
    {
        standardErrors = new List<double>();
        for (int i = 0; i <= covariationMatrix.GetUpperBound(0); i++)
        {
            standardErrors.Add(Math.Sqrt(covariationMatrix[i, i]));
        }

        double nullHypothesisCoefficient = 0; //This may need to be changed depending on what your null hypothesis is
        
        tStatistics = new List<double>();
        for (int i = 0; i < standardErrors.Count; i++)
        {
            tStatistics.Add((coefs[i] - nullHypothesisCoefficient) / standardErrors[i]);
        }

        pValues = new List<double>();
        for (int i = 0; i < tStatistics.Count; i++)
        {
            double tIntegral = alglib.studenttdistribution(numSamples - numVariables, Math.Abs(tStatistics[i]));
            //The p value for one tail of the t-distribution
            double p1 = 1 - tIntegral;
            //The p value for both tails of the t-distribution
            double p2 = p1 * 2;
            pValues.Add(p2);
        }
    }

forum.alglib.net

Forum rules

Calculating p-values for ALGLIB's linear model?

Who is online