1

我在不平衡数据集中进行情绪分析。我对 naive svm 分类器有问题,它比 svm+sampling 给出更好的 roc-auc 分数。这是天真的 svm 结果(第一个悲伤的括号是 roc-auc 分数,第二个是 G-mean 分数和第三个是 f1_measure)

幼稚的支持向量机结果

这是 oversample+svm 结果:

在此处输入图像描述

这也是我的 svm 代码:

    clf=SVC(kernel='linear',C=1,probability=True)
    clf.fit(tf_idf_train3, polarity_train)
    probs = clf.predict_proba(tf_idf_test3)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
    pred = clf.predict(tf_idf_test3)
    roc_auc=roc_auc_score(polarity_test,preds,average='macro')
    print(classification_report(polarity_test,pred))
    print(confusion_matrix(polarity_test,pred))
    gmean=geometric_mean_score(polarity_test,pred,average='macro')
    f1=f1_score(polarity_test, pred, average='macro')

这是我的 svm+oversample 代码:

    clf=SVC(kernel='linear',C=1,probability=True)
    X_resample, y_resampled = ros.fit_resample(tf_idf_train3, polarity_train)
    clf.fit(X_resample, y_resampled)
    probs = clf.predict_proba(tf_idf_test3)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
    pred = clf.predict(tf_idf_test3)
    roc_auc=roc_auc_score(polarity_test,preds ,average='macro')
    print(classification_report(polarity_test,pred))
    print(confusion_matrix(polarity_test,pred))
    gmean=geometric_mean_score(polarity_test,pred,average='macro')
    f1=f1_score(polarity_test, pred, average='macro')
4

0 回答 0