2

我编写了一个适用于 PATSY 和 FORMULA 的代码,但现在我想进行“预测”以验证我通过摘要找到的结果。那么我怎样才能预测我的变量呢?

import numpy as np
from scipy import stats
import scipy
import matplotlib.pyplot as plt
import statsmodels.api as sm

from statsmodels.formula.api import logit, probit, poisson, ols

FNAME ="C:/Users/lenovo/Desktop/table.csv"

my_data = np.genfromtxt (FNAME, delimiter = ',')


x = my_data [:,1]
d = my_data [:,4]
f=my_data[:,6]
c= my_data[:,3]
#crée un masque pour les valeurs nans
masque = ~ (np.isnan (x) | np.isnan (d) | np.isnan (f) | np.isnan (c))

x = my_data[masque, 1] - 1
d = my_data[masque, 4]
f = my_data[masque, 6]
c = my_data[masque, 3]

my_data_dict = dict (
x = x,
d = d,
f = f,
c=c

)

form = 'x ~ C(c)+C(d)+C(f)'


affair_model = logit (form, my_data_dict, manquant = 'drop')

affair_result = affair_model.fit ()

print affair_result.summary () 
4

1 回答 1

1

In this line:

data = df[cols_to_keep].join(dummy_ranks1.ix[:, 'c_2':]).join(dummy_ranks3.ix[:, 'd_2':]).join(dummy_ranks2.ix[:, 'f_2':])

you're selecting only columns ['a', 'b'], then joining with other DataFrames that don't have x in them.

Simply change

cols_to_keep = ['a', 'b']

to

cols_to_keep = ['a', 'b', 'x']

For one-off scripts like this, it's not a bad idea to use sanity checks with assert to make sure it's doing what you expect, e.g.,

assert 'x' in data, 'x is not a column in data'

Since x has been added back into data you'll need to also change train_cols to

cols = data.columns
train_cols = cols[cols != 'x'][1:]
于 2013-08-28T12:45:56.843 回答