我正在尝试使用 statsmodels 对我的数据运行线性模型。我的数据框如下所示:
0 Group Age Education
3_0001 190.8 1.0 47 12
3_0002 482.1 1.0 44 16
4_0003 144.1 0.0 38 18
4_0004 205.6 0.0 51 15
第一列是索引。第二列标题是一个 0,带有几个前导空格。有 88 行数据。我的代码如下:
import statsmodels.formula.api as sm
formula = "'" + list(df)[0] + " ~ " + list(df)[1] + "'"
model = sm.ols(formula, data=df).fit()
我收到一条错误消息,上面写着:
Traceback (most recent call last):
File "AUC.py", line 109, in <module>
model = sm.ols("'"+formula+"'", data=nodeDF_clean).fit()
File "/usr/local/lib64/python3.6/site-packages/statsmodels/base/model.py", line 169, in from_formula
missing=missing)
File "/usr/local/lib64/python3.6/site-packages/statsmodels/formula/formulatools.py", line 65, in handle_formula_data
NA_action=na_action)
File "/usr/local/lib/python3.6/site-packages/patsy/highlevel.py", line 310, in dmatrices
NA_action, return_type)
File "/usr/local/lib/python3.6/site-packages/patsy/highlevel.py", line 169, in _do_highlevel_design
return_type=return_type)
File "/usr/local/lib/python3.6/site-packages/patsy/build.py", line 893, in build_design_matrices
rows_checker.check(value.shape[0], name, origin)
File "/usr/local/lib/python3.6/site-packages/patsy/build.py", line 795, in check
raise PatsyError(msg, origin)
patsy.PatsyError: Number of rows mismatch between data argument and ' 0 ~ Group' (88 versus 1)
' 0 ~ Group'
^^^^^^^^^^^^^^^^^
我正在使用 patsy 0.5.1。和python 3.6.8。我尝试重命名第一列以摆脱前导空格。我已经尝试了许多不同的 ols 公式迭代,都具有相同的错误。我究竟做错了什么?提前致谢。