python 包fancyimpute提供了几种数据插补方法。我尝试使用软估算方法;但是,软估算不提供用于测试数据集的转换方法。更准确地说,Sklearn SimpleImputer(如下所示)提供了 fit、transform 和 fit_transform 方法。另一方面,SoftImpute 提供了唯一的 fit_transform,它允许我在训练中拟合数据,但不能将其转换为测试集。我知道在训练集和测试集上进行插补会导致数据从测试集泄漏到训练中。为此,我们需要在训练上适应,在测试上转型。有什么方法可以将我从训练集中拟合的测试集以软估算方法进行估算?我很感激任何想法。
# this example from https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html
import numpy as np
from sklearn.impute import SimpleImputer
imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
imp_mean.fit([[7, 2, 3], [4, np.nan, 6], [10, 5, 9]])
X_train = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]]
print(imp_mean.transform(X_train))
# SimpleImputer provides transform method, so we can apply fitted imputation into the
testing data e.g.
# X_test =[...]
# print(imp_mean.transform(X_test))
from fancyimpute import SoftImpute
clf = SoftImpute(verbose=True)
clf.fit_transform(X_train)
## There is no clf.tranform to be used with test set e.g. clf.transform(X_test)