我正在使用 statsmodels.api 进行多变量回归
model = sm.regression.linear_model.OLS(dependent, X)
results = model.fit()
summary = results.summary()
其中dependent 是长度为 n 的向量,X 是维度 mxn 的矩阵,其中 m 是因子的数量。
X 的每个分量都是一个行向量,其第一个条目是数据标签,接下来的 n 个条目是数据本身:
["revenue", 123,456,789.........514]
打印摘要给出:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.993
Model: OLS Adj. R-squared: 0.987
Method: Least Squares F-statistic: 159.7
Date: Fri, 25 Oct 2013 Prob (F-statistic): 1.99e-31
Time: 12:14:19 Log-Likelihood: -730.93
No. Observations: 71 AIC: 1530.
Df Residuals: 37 BIC: 1607.
Df Model: 34
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 699.6533 421.414 1.660 0.105 -154.212 1553.519
x1 131.5725 266.202 0.494 0.624 -407.803 670.948
x2 -5186.9570 1.04e+04 -0.499 0.621 -2.63e+04 1.59e+04
x3 2.897e+04 1.51e+04 1.925 0.062 -1525.292 5.95e+04
x4 0.7279 0.373 1.950 0.059 -0.029 1.484
x5 -2.794e+05 4.41e+05 -0.634 0.530 -1.17e+06 6.14e+05
x6 -2500.4833 1533.499 -1.631 0.111 -5607.647 606.680
x7 2.202e+04 1.71e+04 1.290 0.205 -1.26e+04 5.66e+04
x8 5.9603 2.597 2.296 0.027 0.699 11.221
x9 -1.41e+07 1.04e+07 -1.354 0.184 -3.52e+07 7.01e+06
x10 -0.3980 0.561 -0.710 0.482 -1.534 0.738
x11 8.862e+04 8.4e+04 1.055 0.298 -8.16e+04 2.59e+05
x12 6.851e+04 4.81e+04 1.426 0.162 -2.89e+04 1.66e+05
x13 1.189e+08 7.23e+07 1.645 0.108 -2.75e+07 2.65e+08
x14 -531.5723 688.333 -0.772 0.445 -1926.268 863.123
x15 290.7228 9702.296 0.030 0.976 -1.94e+04 1.99e+04
x16 -4316.1159 1235.718 -3.493 0.001 -6819.919 -1812.313
x17 -1.0480 18.339 -0.057 0.955 -38.206 36.110
x18 0.4967 1.108 0.448 0.657 -1.749 2.743
x19 -512.3132 680.352 -0.753 0.456 -1890.838 866.211
x20 -6.174e+05 4.15e+05 -1.489 0.145 -1.46e+06 2.23e+05
x21 -20.1921 9.588 -2.106 0.042 -39.620 -0.764
x22 -1109.1907 868.787 -1.277 0.210 -2869.520 651.139
x23 -3.275e-05 1.74e-05 -1.888 0.067 -6.79e-05 2.41e-06
x24 -3.046e+04 1.87e+04 -1.630 0.112 -6.83e+04 7396.892
x25 -8255.2473 4228.299 -1.952 0.058 -1.68e+04 312.100
x26 -0.4144 0.165 -2.515 0.016 -0.748 -0.081
x27 -3.779e+07 2.33e+07 -1.622 0.113 -8.5e+07 9.43e+06
x28 -672.3038 9934.991 -0.068 0.946 -2.08e+04 1.95e+04
x29 1.271e+05 4.71e+04 2.696 0.010 3.16e+04 2.23e+05
x30 11.2359 5.247 2.141 0.039 0.604 21.868
x31 -2.58e+05 8.63e+05 -0.299 0.767 -2.01e+06 1.49e+06
x32 -5.362e+04 2.66e+04 -2.014 0.051 -1.08e+05 318.991
x33 11.7349 6.720 1.746 0.089 -1.880 25.350
x34 -1.71e+06 1.25e+07 -0.137 0.892 -2.71e+07 2.37e+07
x35 -7.6490 8.019 -0.954 0.346 -23.897 8.600
x36 291.4046 178.169 1.636 0.110 -69.601 652.410
x37 510.0672 318.445 1.602 0.118 -135.164 1155.298
==============================================================================
Omnibus: 3.382 Durbin-Watson: 1.864
Prob(Omnibus): 0.184 Jarque-Bera (JB): 2.615
Skew: -0.441 Prob(JB): 0.271
Kurtosis: 3.324 Cond. No. nan
==============================================================================
print results.params 给出:
[ 6.99653265e+02 1.31572465e+02 -5.18695704e+03 2.89725201e+04
7.27866154e-01 -2.79412892e+05 -2.50048329e+03 2.20188260e+04
5.96032414e+00 -1.40983228e+07 -3.98040736e-01 8.86220943e+04
6.85055661e+04 1.18927196e+08 -5.31572322e+02 2.90722839e+02
-4.31611590e+03 -1.04803807e+00 4.96741935e-01 -5.12313204e+02
-6.17414913e+05 -2.01921161e+01 -1.10919070e+03 -3.27489243e-05
-3.04625838e+04 -8.25524731e+03 -4.14444321e-01 -3.77917370e+07
-6.72303755e+02 1.27068811e+05 1.12359266e+01 -2.57978901e+05
-5.36154172e+04 1.17349174e+01 -1.71045966e+06 -7.64895526e+00
2.91404563e+02 5.10067167e+02]
其中第一个条目 699.6533 是对应于常数项等的系数,一直到 x37。
我的问题是 const 项在摘要中的位置可以在不同的位置(不一定是第一个位置)。而且我需要一种方法来a)用向量上第一个位置的标签标记每个因子或b)始终识别摘要中的哪个条目(以及因此在参数中)对应于const术语的方法。
我想在不使用像熊猫这样的额外包的情况下做到这一点。
请帮忙。
谢谢!