我在不平衡数据集中进行情绪分析。我对 naive svm 分类器有问题,它比 svm+sampling 给出更好的 roc-auc 分数。这是天真的 svm 结果(第一个悲伤的括号是 roc-auc 分数,第二个是 G-mean 分数和第三个是 f1_measure)
这是 oversample+svm 结果:
这也是我的 svm 代码:
clf=SVC(kernel='linear',C=1,probability=True)
clf.fit(tf_idf_train3, polarity_train)
probs = clf.predict_proba(tf_idf_test3)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
pred = clf.predict(tf_idf_test3)
roc_auc=roc_auc_score(polarity_test,preds,average='macro')
print(classification_report(polarity_test,pred))
print(confusion_matrix(polarity_test,pred))
gmean=geometric_mean_score(polarity_test,pred,average='macro')
f1=f1_score(polarity_test, pred, average='macro')
这是我的 svm+oversample 代码:
clf=SVC(kernel='linear',C=1,probability=True)
X_resample, y_resampled = ros.fit_resample(tf_idf_train3, polarity_train)
clf.fit(X_resample, y_resampled)
probs = clf.predict_proba(tf_idf_test3)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
pred = clf.predict(tf_idf_test3)
roc_auc=roc_auc_score(polarity_test,preds ,average='macro')
print(classification_report(polarity_test,pred))
print(confusion_matrix(polarity_test,pred))
gmean=geometric_mean_score(polarity_test,pred,average='macro')
f1=f1_score(polarity_test, pred, average='macro')