我对这个问题的看法是不同的。考虑到如果评论数量较少,剩余的评论是未知的,并且可能在 1 到 10 之间的任何地方。所以我们可以在缺失的范围内进行随机分布,并找到整个最大评论群体的平均值
max_freq = max(rating, key = itemgetter(1))[-1]
>>> for r,f in rating:
missing = max_freq - f
actual_rating = r
if missing:
actual_rating = sum(randint(1,10) for e in range(missing))/ (10.0*missing)
print "Original Rating {}, Scaled Rating {}".format(r, actual_rating)
Original Rating 0.7, Scaled Rating 0.550225179856
Original Rating 0.75, Scaled Rating 0.550952554745
Original Rating 0.89, Scaled Rating 0.89
Original Rating 1, Scaled Rating 0.54975249116)
Original Rating 0.7, Scaled Rating 0.550576978417
Original Rating 0.75, Scaled Rating 0.549582481752
Original Rating 0.89, Scaled Rating 0.89
Original Rating 1, Scaled Rating 0.550458230651