我有一个用两条管道初始化的两阶段元估计器。估计器旨在将观察结果分类为 1、-1 或 0。第一个管道学习区分 0 和 (1, -1),第二个管道学习区分 1 和 -1,去除所有的 0。这是元估计器的代码:
class TwoStageEstimator(BaseEstimator, ClassifierMixin):
def __init__(self, pipeline_1, pipeline_2):
self.pipeline_1 = pipeline_1
self.pipeline_2 = pipeline_2
def fit(self, X, y):
# First-stage training
self.pipeline_1 = clone(self.pipeline_1)
y_train_1 = abs(y)
self.pipeline = self.pipeline_1.fit(X, y_train_1)
# Second-stage training
self.pipeline_2 = clone(self.pipeline_2)
y_train_2 = y[y != 0]
X_train_2 = X.loc[y_train != 0, ]
self.pipeline = self.pipeline_2.fit(X_train_2, y_train_2)
# Set fit status
self.is_fit_ = True
return self
def predict(self, X):
# Check is fit had been called
check_is_fitted(self)
y = self.pipeline_1.predict(X) * self.pipeline_2.predict(X)
return y
如果我将估算器称为
tsm = TwoStageEstimator(pipeline, pipeline)
prd_stance = tsm.fit(X_train, y_train).predict(X_test)
但是当我尝试使用 CV 时,它会中断。
scores = cross_val_score(
tsm, X, y, scoring = 'accuracy', cv = ms.StratifiedKFold(n_splits=7, shuffle=True)
)
scores
错误消息似乎表明问题在于拟合中的索引与在 CV 中完成的索引之间存在冲突。
raise IndexingError(
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
...
raise NotImplementedError(
NotImplementedError: iLocation based boolean indexing on an integer type is not available
谁能在这里指出我的解决方案?