我正在尝试构建一个以多个管道作为输入的投票分类器。我对此很陌生。以下是我正在使用的代码:
clf1 = MultinomialNB(alpha= 0.99, fit_prior= True)
clf2 = Pipeline([('vect', CountVectorizer(max_features=5000,ngram_range=(1,2))),
('tfidf', TfidfTransformer(use_idf= True)),
('clf', SGDClassifier(alpha=0.001,learning_rate='optimal',loss= 'epsilon_insensitive'
,penalty= 'l2',n_iter = 100, random_state=42))])
clf3 = Pipeline([('vect', CountVectorizer(max_features=3500)),
('tfidf', TfidfTransformer(use_idf=False)),
('clf', SVC(random_state= 42,kernel="linear",degree=1,decision_function_shape=None))])
clf4 = Pipeline([('vect', CountVectorizer(max_features = 4000)),
('tfidf', TfidfTransformer(use_idf=False)),
('clf', RandomForestClassifier(random_state = 42,criterion="entropy"))])
eclf = VotingClassifier(estimators=[('mnb', clf1), ('sgd', clf2), ('svm', clf3), ('rf',clf4)], voting='hard')
eclf = eclf.fit(train_data,train_label)
p = eclf.predict(test_data)
np.mean(p==test_class)
该代码基本上构建了 4 个分类器——多项式朴素贝叶斯、SGD 分类器、带线性核的 SVM 和随机森林分类器。当我尝试拟合我的数据时,它给了我以下错误:
could not convert string to float: "training string here"
如果我尝试在单个分类器上调用 fit,则该模式运行良好。有人可以帮忙吗?