python - 带有 rpy2 的 Python 中的序数逻辑回归（R 的 Python 接口）：共线预测变量的问题

Question

我正在尝试在 Python 中使用 rpy2（R 语言的 Python 接口）调用 R 的 mass.polr 函数来执行序数逻辑回归。但是，当我的预测变量中有一些共线或几乎共线的列时，我遇到了麻烦：mass.polr 在拟合期间会自动丢弃其中一些列，这在我尝试对训练数据进行预测时会导致错误。

这是一个最小的例子：

from rpy2.robjects import r, pandas2ri
from rpy2.robjects.packages import importr

pandas2ri.activate()

mass = importr("MASS")

# dataframe with two collinear predictors (x1 and x2)
df = pd.DataFrame(columns = ['target', 'x1', 'x2', 'x3'],
                  data    = [[   0   ,  0  ,  0  ,  1  ],
                             [   1   ,  1  ,  1  ,  0  ],
                             [   2   ,  1  ,  1  ,  1  ]])

model = mass.polr('as.factor(target) ~ .', df, Hess = True) # gives warning below
'''
Warning message:
In polr(as.factor(target) ~ ., data = df, Hess = TRUE) :
  design appears to be rank-deficient, so dropping some coefs

'''

r.predict(model, df, type = "class").__array__() # gives error below
'''
Error in X %*% object$coefficients : non-conformable arguments
'''

同样的错误实际上也发生在 R 中，但我至少可以通过查看summary(model).

相反，在 Python 中r.summary(model).rx2('coefficients')（应该显示与 R 中相同的输出summary(model)）不显示系数名称，而只是显示裸值：

array([[4.57292582e+01, 8.25605929e+02, 5.53887231e-02],
       [2.11604944e+01, 2.85721885e+02, 7.40597606e-02],
       [3.19476895e+01, 3.60605165e+02, 8.85946531e-02],
       [5.66312792e+01, 8.93862000e+02, 6.33557296e-02]])

有谁知道在 Python 中检索系数名称的方法？或者还有其他解决方法吗？

score 0 · Accepted Answer

r.summary(model).rx2('coefficients')返回一个没有名称的对象，因为您在该脚本（行）的前面请求将 R 对象转换为pandas（和隐式）对象。Numpy 数组没有命名元素。numpypandas2ri.activate()

activate不再推荐使用。考虑在上下文中使用本地转换器（pandas文档中的示例：https ://rpy2.github.io/doc/v3.3.x/html/generated_rst/pandas.html ）。

python - 带有 rpy2 的 Python 中的序数逻辑回归（R 的 Python 接口）：共线预测变量的问题

1 回答 1

Related

Reference