我计算了不同事件频率的向量,表示为分数并按降序排序。我需要连接一个需要正整数百分比的工具,该百分比总和必须恰好为 100。我想以最能代表输入分布的方式生成百分比。也就是说,我希望百分比之间的关系(比率)与输入分数中的关系(比率)最匹配,尽管任何非线性都会导致剪掉长尾。
我有一个生成这些百分比的函数,但我认为它不是最佳或优雅的。特别是,在诉诸“愚蠢的整数技巧”之前,我想在数字空间中做更多的工作。
这是一个示例频率向量:
fractionals <- 1 / (2 ^ c(2, 5:6, 8, rep(9,358)))
这是我的功能:
# Convert vector of fractions to integer percents summing to 100
percentize <- function(fractionals) {
# fractionals is sorted descending and adds up to 1
# drop elements that wouldn't round up to 1% vs. running total
pctOfCum <- fractionals / cumsum(fractionals)
fractionals <- fractionals[pctOfCum > 0.005]
# calculate initial percentages
percentages <- round((fractionals / sum(fractionals)) * 100)
# if sum of percentages exceeds 100, remove proportionally
i <- 1
while (sum(percentages) > 100) {
excess <- sum(percentages) - 100
if (i > length(percentages)) {
i <- 1
}
partialExcess <- max(1, round((excess * percentages[i]) / 100))
percentages[i] <- percentages[i] - min(partialExcess,
percentages[i] - 1)
i <- i + 1
}
# if sum of percentages shorts 100, add proportionally
i <- 1
while (sum(percentages) < 100) {
shortage <- 100 - sum(percentages)
if (i > length(percentages)) {
i <- 1
}
partialShortage <- max(1, round((shortage * percentages[i]) / 100))
percentages[i] <- percentages[i] + partialShortage
i <- i + 1
}
return(percentages)
}
有任何想法吗?