3

我正在尝试使用 Numpy 的普通最小二乘 (OLS)函数复制 Statsmodels 的权重最小二乘 (WLS)函数的功能(即 Numpy 将 OLS 称为“最小二乘”)。

换句话说,我想在 Numpy 中计算 WLS。我使用这个 Stackoverflow 帖子作为参考,但是从 Statsmodel 到 Numpy 会出现截然不同的 R² 值。

采用以下复制此示例代码:

import numpy as np
import statsmodels.formula.api as smf
import pandas as pd

# Test Data
patsy_equation = "y ~ C(x) - 1" # Use minus one to get ride of hidden intercept of "+ 1"
weight = np.array([0.37, 0.37, 0.53, 0.754])
y = np.array([0.23, 0.55, 0.66, 0.88])
x = np.array([3, 3, 3, 3])
d = {"x": x.tolist(), "y": y.tolist()}
data_df = pd.DataFrame(data=d)

# Weighted Least Squares from Statsmodel API
statsmodel_model = smf.wls(formula=patsy_equation, weights=weight, data=data_df)
statsmodel_r2 = statsmodel_model.fit().rsquared      

# Weighted Least Squares from Numpy API
Aw = x.reshape((-1, 1)) * np.sqrt(weight[:, np.newaxis]) # Multiply two column vectors
Bw = y * np.sqrt(weight)
numpy_model, numpy_resid = np.linalg.lstsq(Aw, Bw, rcond=None)[:2]
numpy_r2 = 1 - numpy_resid / (Bw.size * Bw.var())

print("Statsmodels R²: " + str(statsmodel_r2))
print("Numpy R²: " + str(numpy_r2[0]))

运行此类代码后,我得到以下结果:

Statsmodels R²: 2.220446049250313e-16
Numpy R²: 0.475486515775414

显然这里有问题!谁能在这里指出我的缺点?我错过了理解 patsy 公式吗?

4

0 回答 0