python - 在（堆叠）分类器之上处理/特征工程时，您可以使用特征的多类分类器吗？

Question

在堆叠分类器之上处理/特征工程时，您可以使用特征的多类分类器吗？

用例：您有 10 个特征可用于多类分类问题。其中一个特征是文本，其他特征是分类、数字和时间。

9 个功能通过典型的流水线步骤实现：

类似于具有混合类型示例的 sklearn Column Transformer：

numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

文本特征通过包含特定领域的训练向量模型的文本管道进行“预处理”或设计。其输出是一个 100 维向量/数组，传入一个多类分类器，输出分类概率（“predict_proba”）。

然后，这些概率将与上述预处理器中的特征相结合，然后再传递给分类器/堆叠分类器：

text_features = ['domain text']
text_transformer = Pipeline(steps=[
    ('text_vectors', (TextVectorizer() ),
    ('predict_prob',  DecisionTreeClassifier(params='awesome'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('text', text_transformer, text_features),
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])


clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(solver='lbfgs'))])

在几种不同的方式中，我使用了 sklearn mixins（BaseEstimator、TransformerMixin、ClassifierMixin）来创建一个或多个自定义类来做到这一点，但我失败了。很多错误。似乎我无法让分类器在变压器类中工作。

这是我的许多尝试之一的示例

class TextVectorClassifierProba(ClassifierMixin,BaseEstimator, TransformerMixin):
    """ """

    def __init__(self):


        pass

    def transform(self, df, y=None):
        """The workhorse of this feature extractor"""
        vector = Pipeline(steps=[('word_vectorizer', TextVectorTransformer())
                         ,('classifier',DecisionTreeClassifier())])


        return vector.fit(df).predict_proba(df)

    def fit(self, df, y=None):
        """Returns `self` unless something different happens in train and test"""
        return self

真正的问题是：您能否在堆叠集成中的数据子集上拟合然后预测作为转换的一部分？

在几种不同的方式中，我使用了 sklearn mixins（BaseEstimator、TransformerMixin、ClassifierMixin）来创建自定义类来做到这一点，但我失败了。

这真的会奏效吗？有没有人见过这样的事情，或者我只是在做疯狂的梦？

任何见解或想法将不胜感激。

谢谢！

python - 在（堆叠）分类器之上处理/特征工程时，您可以使用特征的多类分类器吗？

0 回答 0

Related

Reference