I'm looking to calculate least squares linear regression from an N by M matrix and a set of known, ground-truth solutions, in a N-1 matrix. From there, I'd like to get the slope, intercept, and residual value of each regression. Basic idea being, I know the actual value of that should be predicted for each sample in a row of N, and I'd like to determine which set of predicted values in a column of M is most accurate using the residuals.
I don't describe matrices well, so here's a drawing:
(N,M) matrix with predicted values for each row N
in each column of M...
##NOTE: Values of M and N are not actually 4 and 3, just examples
4 columns in "M"
[1, 1.1, 0.8, 1.3]
[2, 1.9, 2.2, 1.7] 3 rows in "N"
[3, 3.1, 2.8, 3.3]
(1,N) matrix with actual values of N
[1]
[2] Actual value of each sample N, in a single column
[3]
So again, for clarity's sake, I'm looking to calculate the lstsq regression between each column of the (N,M) matrix and the (1,N) matrix.
For instance, the regression between
[1] and [1]
[2] [2]
[3] [3]
then the regression between
[1] and [1.1]
[2] [1.9]
[3] [3.1]
and so on, outputting the slope, intercept, and standard error (average residual) for each regression calculated.
So far in the numpy/scipy documentation and around the 'net, I've only found examples computing one column at a time. I had thought numpy had the capability to compute regressions on each column in a set with the standard
np.linalg.lstsq(arrayA,arrayB)
But that returns the error
ValueError: array dimensions must agree except for d_0
Do I need to split the columns into their own arrays, then compute one at a time? Is there a parameter or matrix operation I need to use to have numpy calculate the regressions on each column independently?
I feel like it should be simpler? I've looked it all over, and I can't seem to find anyone doing something similar.