不太确定您在使用 binning 和其余代码做什么。您在 R 中所做的是使用正交多项式进行回归。您可以查看此内容以获取有关使用它的原因的更多信息。
如果您使用的是 statsmodels,您只需要为输入矩阵提供 x、x^2、x^3 的转换值,然后执行 QR 分解,正如本文中所涉及的那样。您可以使用 sklearn 获取矩阵:
import numpy as np
import statsmodels.api as sm
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
np.random.seed(111)
df = pd.DataFrame({'x':np.random.uniform(0,1,50),'y':np.random.poisson(5,50)})
xp = PolynomialFeatures(degree=6).fit_transform(df[['x']])
xp = np.linalg.qr(xp)[0][:,1:]
model = sm.GLM(df['y'],sm.add_constant(xp),family=sm.families.Poisson()).fit()
model.summary()
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 50
Model: GLM Df Residuals: 43
Model Family: Poisson Df Model: 6
Link Function: log Scale: 1.0000
Method: IRLS Log-Likelihood: -100.45
Date: Thu, 04 Mar 2021 Deviance: 33.846
Time: 22:32:57 Pearson chi2: 31.8
No. Iterations: 4
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 1.5458 0.066 23.495 0.000 1.417 1.675
x1 -0.2059 0.439 -0.469 0.639 -1.067 0.655
x2 0.8169 0.466 1.754 0.079 -0.096 1.730
x3 0.1178 0.442 0.267 0.790 -0.748 0.984
x4 -0.5503 0.454 -1.212 0.226 -1.440 0.340
x5 0.0035 0.457 0.008 0.994 -0.892 0.899
x6 -0.6878 0.455 -1.512 0.131 -1.579 0.204
==============================================================================
我们将数据写入 csv
df.to_csv("data.csv")
并使用 R 拟合,您可以看到我们得到相同的系数:
df = read.csv("data.csv",row.names=1)
summary(glm(y ~ poly(x, 6), family = poisson, data = df))
Call:
glm(formula = y ~ poly(x, 6), family = poisson, data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.42648 -0.73970 -0.04625 0.61351 1.81234
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.545808 0.065794 23.495 <2e-16 ***
poly(x, 6)1 -0.205940 0.439457 -0.469 0.6393
poly(x, 6)2 0.816910 0.465770 1.754 0.0794 .
poly(x, 6)3 0.117815 0.441847 0.267 0.7897
poly(x, 6)4 -0.550271 0.454099 -1.212 0.2256
poly(x, 6)5 -0.003508 0.456691 -0.008 0.9939
poly(x, 6)6 0.687751 0.454953 1.512 0.1306
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 40.443 on 49 degrees of freedom
Residual deviance: 33.846 on 43 degrees of freedom
AIC: 214.91