python-2.7 - 如何在 python scikit-learn 中优化精确召回曲线而不是 AUC-ROC 曲线？

Question

我在问一个后续问题，正如我之前的帖子所建议的那样——良好的 ROC 曲线，但精度召回曲线很差。我只使用 Python scikit-learn 的默认设置。似乎优化是在 AUC-ROC 上，但我对优化精确召回更感兴趣。以下是我的代码。

# Get ROC 
y_score = classifierUsed2.decision_function(X_test)
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_score)
roc_auc = auc(false_positive_rate, true_positive_rate)
print 'AUC-'+ethnicity_tar+'=',roc_auc
# Plotting
ax1.plot(false_positive_rate, true_positive_rate, c=color, label=('AUC-'+ethnicity_tar+'= %0.2f'%roc_auc))
ax1.plot([0,1],[0,1], color='lightgrey', linestyle='--')
ax1.legend(loc='lower right', prop={'size':8})

# Get P-R pairs
precision, recall, prThreshold = precision_recall_curve(y_test, y_score)
# Plotting
ax2.plot(recall, precision, c=color, label=ethnicity_tar)
ax2.legend(loc='upper right', prop={'size':8})

我在哪里以及如何插入 python 代码来更改设置，以便优化精确召回？

score 3 · Accepted Answer

您的问题实际上有两个问题：

如何评估单个数字中的精确召回曲线有多好？
如何建立一个模型来最大化这个数字？

我会依次回答他们：

1.precision-recall曲线质量的衡量标准是平均精度。该平均精度等于未插值（即分段常数）精度召回曲线下的确切面积。

2. 为了最大化平均精度，您只能调整算法的超参数。GridSearchCV如果你设置了，你可以用scoring='average_precision'. 或者您可以手动或使用其他一些调整技术找到最佳超参数。

这通常不可能直接优化平均精度（在模型拟合期间），但也有一些例外。例如，这篇文章描述了一个最大化平均精度的 SVM。

python-2.7 - 如何在 python scikit-learn 中优化精确召回曲线而不是 AUC-ROC 曲线？

1 回答 1

Related

Reference