python 包Fancyimpute提供了几种在 Python 中估算缺失值的方法。该文档提供了以下示例:
# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN
# Model each feature with missing values as a function of other features, and
# use that estimate for imputation.
X_filled_ii = IterativeImputer().fit_transform(X_incomplete)
这在将插补方法应用于数据集时效果很好X
。但是,如果需要training/test
拆分怎么办?一次
X_train_filled = IterativeImputer().fit_transform(X_train_incomplete)
被称为,我如何估算测试集并创建X_test_filled
?测试集需要使用来自训练集的信息进行估算。我想IterativeImputer()
应该返回和对象可以适合X_test_incomplete
。那可能吗?
请注意,对整个数据集进行插补然后拆分为训练集和测试集是不正确的。