1

我正在尝试使用 statsmodels 对我的数据运行线性模型。我的数据框如下所示:

              0     Group    Age    Education
3_0001    190.8      1.0      47       12
3_0002    482.1      1.0      44       16
4_0003    144.1      0.0      38       18
4_0004    205.6      0.0      51       15

第一列是索引。第二列标题是一个 0,带有几个前导空格。有 88 行数据。我的代码如下:

import statsmodels.formula.api as sm

formula = "'" + list(df)[0] + " ~ " + list(df)[1] + "'"
model = sm.ols(formula, data=df).fit()

我收到一条错误消息,上面写着:

Traceback (most recent call last):
  File "AUC.py", line 109, in <module>
    model = sm.ols("'"+formula+"'", data=nodeDF_clean).fit()
  File "/usr/local/lib64/python3.6/site-packages/statsmodels/base/model.py", line 169, in from_formula
    missing=missing)
  File "/usr/local/lib64/python3.6/site-packages/statsmodels/formula/formulatools.py", line 65, in handle_formula_data
    NA_action=na_action)
  File "/usr/local/lib/python3.6/site-packages/patsy/highlevel.py", line 310, in dmatrices
    NA_action, return_type)
  File "/usr/local/lib/python3.6/site-packages/patsy/highlevel.py", line 169, in _do_highlevel_design
    return_type=return_type)
  File "/usr/local/lib/python3.6/site-packages/patsy/build.py", line 893, in build_design_matrices
    rows_checker.check(value.shape[0], name, origin)
  File "/usr/local/lib/python3.6/site-packages/patsy/build.py", line 795, in check
    raise PatsyError(msg, origin)
patsy.PatsyError: Number of rows mismatch between data argument and '      0 ~ Group' (88 versus 1)
    '      0 ~ Group'
    ^^^^^^^^^^^^^^^^^

我正在使用 patsy 0.5.1。和python 3.6.8。我尝试重命名第一列以摆脱前导空格。我已经尝试了许多不同的 ols 公式迭代,都具有相同的错误。我究竟做错了什么?提前致谢。

4

0 回答 0