1

我正在尝试构建一个以多个管道作为输入的投票分类器。我对此很陌生。以下是我正在使用的代码:

clf1 = MultinomialNB(alpha= 0.99, fit_prior= True)
clf2 = Pipeline([('vect', CountVectorizer(max_features=5000,ngram_range=(1,2))),
                    ('tfidf', TfidfTransformer(use_idf= True)),
                    ('clf', SGDClassifier(alpha=0.001,learning_rate='optimal',loss= 'epsilon_insensitive'
                                          ,penalty= 'l2',n_iter = 100, random_state=42))])
clf3 = Pipeline([('vect', CountVectorizer(max_features=3500)),
                    ('tfidf', TfidfTransformer(use_idf=False)),
                    ('clf', SVC(random_state= 42,kernel="linear",degree=1,decision_function_shape=None))])
clf4 = Pipeline([('vect', CountVectorizer(max_features = 4000)),
                    ('tfidf', TfidfTransformer(use_idf=False)),
                    ('clf', RandomForestClassifier(random_state = 42,criterion="entropy"))])
eclf = VotingClassifier(estimators=[('mnb', clf1), ('sgd', clf2), ('svm', clf3), ('rf',clf4)], voting='hard')
eclf = eclf.fit(train_data,train_label)

p = eclf.predict(test_data)
np.mean(p==test_class)

该代码基本上构建了 4 个分类器——多项式朴素贝叶斯、SGD 分类器、带线性核的 SVM 和随机森林分类器。当我尝试拟合我的数据时,它给了我以下错误:

could not convert string to float: "training string here"

如果我尝试在单个分类器上调用 fit,则该模式运行良好。有人可以帮忙吗?

4

0 回答 0