0

python 包fancyimpute提供了几种数据插补方法。我尝试使用软估算方法;但是,软估算不提供用于测试数据集的转换方法。更准确地说,Sklearn SimpleImputer(如下所示)提供了 fit、transform 和 fit_transform 方法。另一方面,SoftImpute 提供了唯一的 fit_transform,它允许我在训练中拟合数据,但不能将其转换为测试集。我知道在训练集和测试集上进行插补会导致数据从测试集泄漏到训练中。为此,我们需要在训练上适应,在测试上转型。有什么方法可以将我从训练集中拟合的测试集以软估算方法进行估算?我很感激任何想法。

    # this example from https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html
    import numpy as np 
    from sklearn.impute import SimpleImputer
    imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
    imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])

    X_train = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]]
    print(imp_mean.transform(X_train))
    # SimpleImputer provides transform method, so we can apply fitted imputation into the 
    testing data e.g.
    # X_test =[...]
    # print(imp_mean.transform(X_test))

   from fancyimpute import SoftImpute
   clf = SoftImpute(verbose=True)
   clf.fit_transform(X_train)
   ## There is no clf.tranform to be used with test set e.g. clf.transform(X_test)


4

1 回答 1

0

花式估算不支持归纳模式。这里重要的是在不使用测试数据的情况下填写训练数据。我认为您可以使用估算的训练数据估算测试数据。示例代码:

len_train_data=train_df.shape[0]<br>
imputer=SoftImpute() <br>  
#impute train data  <br>    
X_train_fill_SVD = imputer.fit_transform(train_df)<br>
X_train_fill_SVD=pd.DataFrame(X_train_fill_SVD)<br>
#concat imputed train and test<br>
Concat_data=pd.concat((X_train_fill_SVD,test_df),axis=0)<br>
Concat_data=imputer.fit_transform(Concat_data)<br>
Concat_data=pd.DataFrame(Concat_data)<br>
#fetch imputed test data  <br>
X_test_fill_SVD=Concat_data.iloc[len_train_data:,]<br>
于 2021-01-16T09:29:13.140 回答