1

有没有办法在sklearn中将不同的分类器组合成一个?我找到sklearn.ensamble包裹。它包含不同的模型,如 AdaBoost 和 RandofForest,但它们在底层使用决策树,我想使用不同的方法,如 SVM 和逻辑回归。sklearn可以吗?

4

2 回答 2

2

您只想进行多数投票吗?这没有实现afaik。但正如我所说,您可以平均 predict_proba 分数。或者您可以使用预测的 LabelBinarizer 并对它们进行平均。这将实施投票计划。

即使您对概率不感兴趣,平均预测概率也可能比进行简单投票更稳健。但是,如果不尝试,这很难说。

于 2013-04-01T12:00:51.690 回答
0

是的,您可以train在同一个数据集上使用不同的模型并让每个模型进行预测

# Import functions to compute accuracy and split data
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Import models, including VotingClassifier meta-model
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.ensemble import VotingClassifier

# Set seed for reproducibility
SEED = 1

现在实例化这些模型

# Instantiate lr
lr = LogisticRegression(random_state = SEED)

# Instantiate knn
knn = KNN(n_neighbors = 27)

# Instantiate dt
dt = DecisionTreeClassifier(min_samples_leaf = 0.13, random_state = SEED)

然后将它们定义为一个list分类器并将这些不同的分类器组合成一个Meta-Model

classifiers = [('Logistic Regression', lr), 
               ('K Nearest Neighbours', knn), 
               ('Classification Tree', dt)]

for现在使用循环遍历这个预定义的分类器列表

for clf_name, clf in classifiers:    

    # Fit clf to the training set
    clf.fit(X_train, y_train)    

    # Predict y_pred
    y_pred = clf.predict(X_test)

    # Calculate accuracy
    accuracy = accuracy_score(y_pred, y_test) 

    # Evaluate clf's accuracy on the test set
    print('{:s} : {:.3f}'.format(clf_name, accuracy))

最后,我们将评估投票分类器的性能,该分类器采用列表分类器中定义的模型的输出并通过多数投票分配标签。

# Voting Classifier
# Instantiate a VotingClassifier vc
vc = VotingClassifier(estimators = classifiers)     

# Fit vc to the training set
vc.fit(X_train, y_train)   

# Evaluate the test set predictions
y_pred = vc.predict(X_test)

# Calculate accuracy score
accuracy = accuracy_score(y_pred, y_test)
print('Voting Classifier: {:.3f}'.format(accuracy))
于 2021-04-01T18:11:22.323 回答