python - Statsmodels OLS function for multiple regression parameters

Question

Lets say I want to find the alpha (a) values for an equation which has something like

y=a+ax1+ax2+...+axi

Using OLS lets say we start with 10 values for the basic case of i=2

#y=a+ax1+ax2

y = np.arange(1, 10)
x = np.array([[ 5, 10], [10,  5], [ 5, 15],
       [15, 20], [20, 25], [25, 30],[30, 35],
       [35,  5], [ 5, 10], [10, 15]])

Using statsmodel I would generally the following code to obtain the roots of nx1 x and y array:

import numpy as np
import statsmodels.api as sm

X = sm.add_constant(x)

# least squares fit
model = sm.OLS(y, X)
fit = model.fit()
alpha=fit.params

But this does not work when x is not equivalent to y. The equation is here on the first page if you do not know what OLS.

score 1 · Accepted Answer

The traceback tells you what's wrong

    raise ValueError("endog and exog matrices are different sizes")
ValueError: endog and exog matrices are different sizes

Your x has 10 values, your y has 9 values. A regression only works if both have the same number of observations.

endog is y and exog is x, those are the names used in statsmodels for the independent and the explanatory variables.

If you replace your y by

y = np.arange(1, 11)

then everything works as expected.

score 0 · Accepted Answer

这是上面的基本问题，你说你使用了 10 个项目，但你只使用 9 作为 y 的向量。

>>> import numpy
>>> len(numpy.arange(1, 10))
9

这是因为 Python 中的切片和范围会上升到但不包括停止整数。如果你做了：

numpy.arange(10)

您将拥有一个包含 10 个项目的列表，从 0 开始，以 9 结束。

对于回归，您需要为每组预测变量提供一个预测变量。否则，预测器是无用的。您也可以丢弃没有预测变量的预测变量集。

python - Statsmodels OLS function for multiple regression parameters

2 回答 2

Related

Reference