math - 我应该如何订购这些“有用”的分数？

Question

在我网站上用户生成的帖子下，我有一个类似亚马逊的评级系统：

   Was this review helpful to you: Yes | No

如果有投票，我会在该行上方显示结果，如下所示：

   5 of 8 people found this reply helpful.

我想根据这些排名对帖子进行排序。如果您的排名从最有帮助到最无帮助，您会如何排序以下帖子？

   a) 1/1 = 100% helpful
   b) 2/2 = 100% helpful
   c) 999/1000 = 99.9% helpful
   b) 3/4 = 75% helpful
   e) 299/400 = 74.8% helpful

显然，仅根据有帮助的百分比进行排序是不对的，应该以某种方式考虑总票数。有标准的方法吗？

更新：

使用 Charles 的公式计算 Agresti-Coull 下限并对其进行排序，以上示例的排序方式如下：

   1) 999/1000 (99.9%) = 95% likely to fall in 'helpfulness' range of 99.2% to 100%
   2) 299/400 (74.8%) = 95% likely to fall in 'helpfulness' range of 69.6% to 79.3%
   3) 3/4 (75%) = 95% likely to fall in 'helpfulness' range of 24.7% to 97.5%
   4) 2/2 (100%) = 95% likely to fall in 'helpfulness' range of 23.7% to 100%
   5) 1/1 (100%) = 95% likely to fall in 'helpfulness' range of 13.3% to 100%

直觉上，这感觉是对的。

更新 2：

从应用程序的角度来看，我不想每次拉出帖子列表时都运行这些计算。我在想我要么更新并存储 Agresti-Coull 下限，要么按照常规的、cron 驱动的时间表（仅更新自上次运行以来收到投票的帖子），要么在收到新投票时更新它.

score 5 · Accepted Answer

对于每篇文章，确定你期望它有多大帮助的界限。我更喜欢使用 Agresti-Coull 区间。伪代码：

float AgrestiCoullLower(int n, int k) {
  //float conf = 0.05;  // 95% confidence interval
  float kappa = 2.24140273; // In general, kappa = ierfc(conf/2)*sqrt(2)
  float kest=k+kappa^2/2;
  float nest=n+kappa^2;
  float pest=kest/nest;
  float radius=kappa*sqrt(pest*(1-pest)/nest);
  return max(0,pest-radius); // Lower bound
  // Upper bound is min(1,pest+radius)
}

然后取估计值的下限并对此进行排序。所以 2/2 （根据 Agresti-Coull）有 95% 的可能性落在 23.7% 到 100% 的“有用”范围内，因此它低于 99.2% 到 100% 的 999/1000（因为 0.237 < .992）。

编辑：由于有些人似乎发现这很有帮助（哈哈），让我注意，可以根据您想要的自信/风险厌恶程度来调整算法。您需要的信心越少，您就越愿意放弃未经测试但得分高的评论而放弃“经过验证”（高票）的评论。90% 置信区间给出 kappa = 1.95996398，85% 置信区间给出 1.78046434，75% 置信区间给出 1.53412054，而 50% 置信区间完全谨慎给出 1.15034938。

50% 的置信区间给出

1) 999/1000 (99.7%) = 50% likely to fall in 'helpfulness' range of 99.7% to 100%
2) 299/400 (72.2%) = 50% likely to fall in 'helpfulness' range of 72.2% to 77.2%
3) 2/2 (54.9%) = 50% likely to fall in 'helpfulness' range of 54.9% to 100%
4) 3/4 (45.7%) = 50% likely to fall in 'helpfulness' range of 45.7% to 91.9%
5) 1/1 (37.5%) = 50% likely to fall in 'helpfulness' range of 37.5% to 100%

总体而言并没有什么不同，但它确实更喜欢 2/2 而不是 3/4 的安全性。

score 4 · Accepted Answer

在http://stats.stackexchange.com上问这个问题可能更好。

我猜你仍然想通过增加“乐于助人”来订购。

如果您想知道给定数字的精确度，最简单的方法是使用二项式分布方差的平方根，等于n响应总数和p“有帮助”的响应比例。

score 1 · Accepted Answer

一个非常简单的解决方案是忽略所有票数少于截止数量的内容，然后按百分比排序。

例如（至少需要五票）

   1.  99.9% (1000 votes)
   2.  74.8%  (400 votes)
   3-5.  waiting for five votes

score 1 · Accepted Answer

这取决于预期的积极反馈率和平均投票的人数。如果像您给出的示例一样，有时会有 5 人和 10 人投票，有时会有 1000 人投票，那么我建议使用威尔逊中点：

(x+z^2/2)/(n+z^2)    The midpoint of the Adjusted Wald Interval / Wilson Score

where:
n = Sum(all_votes),  
x = Sum(positive_votes) / n, 
z = 1.96 (fixed value)

math - 我应该如何订购这些“有用”的分数？

4 回答 4

Related

Reference