该函数接收一个列表并计算一个基尼指数。基尼指数的计算方法是用 1 减去每个类别的概率平方和。
Input
- values: a list of labels.
Output
- impurity: gini index of the list.
def gini(values):
height, area = 0, 0
for value in values:
height += value
area += height - value / 2.
fair_area = height * len(values) / 2.
impurity = (fair_area - area) / fair_area
return impurity
impurity should be 0.4082 for gini([0,0,0,0,0,1,1]) (now its 0.7142)
and 0.5 for gini([0,0,1,1]) (this works)
如何根据最终结果正确计算?