normalization - 标准化具有多个来源的成就

Question

我正在寻找一个好的算法推荐。

我有用户和成就。用户创建成就，然后将其提供给其他用户。与每个成就相关联的是用户指定的点值。一个用户的总分是他们所有成就的总和。

基本上：

Achievement :
    owner = Alias
    points = int

User :
    achievements = list(Achievement)
    def points() :
        sum([achievements.points])

好的，所以这个系统显然非常适合游戏。您可以创建许多帐户并互相取得大量成就。我试图通过将点值缩放到与用户指定的不同的值来减少一点。

假设所有用户都是诚实的，但他们只是很难以不同的方式衡量。我应该如何标准化点值？AKA 一个用户为每个简单的成就给出 5 分，另一个给出 10 分，我怎样才能将它们标准化为一个值。目标是分数与难度成正比的分布。
如果一个用户不擅长判断分值，我如何根据获得成就的用户数来判断难度？
假设用户可以大部分被划分为不相交的组，其中一个用户将成就授予一整套其他用户。这对前两种算法有帮助吗？例如，用户 A 仅向以奇数结尾的用户授予成就，而用户 B 仅向以偶数结尾的用户授予成就。
如果每个人都是恶意的，我能离让用户无法过度夸大他们的积分值还有多远？

注意：给予用户的质量与他获得的成就没有任何关系。许多给予者只是机器人，它们自己没有收到任何东西，但会自动奖励用户的某些行为。

我目前的计划是这样的。我有一个从我那里获得成就的人分配 10 分。如果我总共给 55 人发放了 10 个成就，我的分配是 550。然后根据获得它的人数分配给每个成就。如果分布是[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]获得每项成就的人，那么点值将是[50, 25, 16.6, 12.5, 10, 8.3, 7.1, 6.25, 5.5, 5]。

欢迎和赞赏我的方法和替代建议的任何问题。另外，发布您能想到的我错过的其他案例，我会将它们添加到列表中。谢谢！

score 0 · Accepted Answer

我在自己的网站上一直在努力解决这类问题。如果您有大量现有数据可以用作基线，那么分数标准化似乎非常有效。首先获取用户创建的成就的平均值和标准差：

SELECT AVG(Points) AS user_average, 
STDDEV_POP(Points) AS user_stddev
FROM Achievements WHERE Owner = X

使用这些值来计算上下文无关的“z-score”：

$zscore = ($rating - $user_average) / $user_stddev;

获取所有成就的均值和标准差：

SELECT AVG(Points) AS all_average, 
STDDEV_POP(Points) AS all_stddev 
FROM Achievements

使用这些值创建一个标准化的“t-score”：

$tscore = $all_average + ($all_stddev * $zscore);

然后使用 t 分数作为成就价值的内部表示。YMMV。:)

score 0 · Accepted Answer

I think that in your system, as in stackoverflow, digg, slashdot, etc. your basic goals are to

Indentify honest users
Promote their actions

Generally we identify honest users by their actions: those accounts that have existed for a long time on the site and have been vetted by other users, and by you. Stack overflow uses the reputation score for this, slashdot uses karma points.

Once you identify these honest users then you can have their votes count in proportion to the reputation score: the more honest a user seems to be the more we trust his achievements.

Thus, you might give new accounts an initial score of 10. That user can then give any number of achievements he wants but their actual total value will be 10 (like the proportional allocation you suggest). That is, if a new user gives 100 achievements (all worth the same number of points) then each one will be worth .1 points because his score is 10. Then, as that user gets achievements from other users his score increases.

Basically, I'm suggesting you use pagerank, but instead of ranking web pages you are ranking users and instead of hyperlinks the links are achievements given by that user to others.

That's one way to solve this problem. There are many others. It depends on your specific needs. Auctions are always fun. You can have everyone bid on an achievement before it is actually achieved in order to establish the price (score) that the community places on that achievement. You will need to limit the amount of 'money' people have.

score 0 · Accepted Answer

正确，$rating 是输入，$tscore 是标准化输出。

理想情况下，每个人都会以相同的比例为他们的成就分配分数。愚蠢或微不足道的成就得一分，普通成就得 10 分，真正史诗般的成就得 50 分，等等。但是在分配分数时，人们有非常不同的行为。有些人会非常慷慨，并让每一项成就都物有所值。其他人将严格和准确，认真遵守与成就难度相关的规模。其他人可能认为人们担心积分，并为他们创造的所有成就分配最低价值是愚蠢的。

标准化尝试处理这些个体异常并将每个人的评级调整到相同的规模。这就像他们在奥运会上处理评委的分数一样。您不会“盲目地信任”用户分配给某项成就的价值，但如果它是系统的一部分，您就需要考虑这一点。否则，您可能只是硬编码成就的点值，限制创建它们的频率，听起来这将遏制最严重的滥用。但是分数很有用，因为在标准化之后，您可以计算出如果满足以下条件，成就的价值是多少它是由典型的普通用户创建的。这使得人们很难“玩弄”系统，因为他们离成就的平均值和分布越远，他们自己的价值观就越能回归基线。

我应该提一下，我不是受过专业训练的程序员，而且我从未上过统计课或任何高等数学。由于我自己理解的局限，也许我不是解释这一点的最佳人选。但是我在自己的网站上一直在努力解决类似的问题（用户对用户的评级），在尝试了多种方法之后，这似乎是最有希望的。实现的大部分灵感来自http://www.ericdigests.org/2003-4/score-normilization.html，所以您可能也想阅读它。

normalization - 标准化具有多个来源的成就

3 回答 3

Related

Reference