-1

我计算了不同事件频率的向量,表示为分数并按降序排序。我需要连接一个需要正整数百分比的工具,该百分比总和必须恰好为 100。我想以最能代表输入分布的方式生成百分比。也就是说,我希望百分比之间的关系(比率)与输入分数中的关系(比率)最匹配,尽管任何非线性都会导致剪掉长尾。

我有一个生成这些百分比的函数,但我认为它不是最佳或优雅的。特别是,在诉诸“愚蠢的整数技巧”之前,我想在数字空间中做更多的工作。

这是一个示例频率向量:

fractionals <- 1 / (2 ^ c(2, 5:6, 8, rep(9,358)))

这是我的功能:

# Convert vector of fractions to integer percents summing to 100
percentize <- function(fractionals) {
  # fractionals is sorted descending and adds up to 1
  # drop elements that wouldn't round up to 1% vs. running total
  pctOfCum <- fractionals / cumsum(fractionals)
  fractionals <- fractionals[pctOfCum > 0.005]

  # calculate initial percentages
  percentages <- round((fractionals / sum(fractionals)) * 100)

  # if sum of percentages exceeds 100, remove proportionally
  i <- 1
  while (sum(percentages) > 100) {
    excess <- sum(percentages) - 100
    if (i > length(percentages)) {
      i <- 1
    }
    partialExcess <- max(1, round((excess * percentages[i]) / 100))
    percentages[i] <- percentages[i] - min(partialExcess,
                                           percentages[i] - 1)
    i <- i + 1
  }

  # if sum of percentages shorts 100, add proportionally
  i <- 1
  while (sum(percentages) < 100) {
    shortage <- 100 - sum(percentages)
    if (i > length(percentages)) {
      i <- 1
    }
    partialShortage <- max(1, round((shortage * percentages[i]) / 100))
    percentages[i] <- percentages[i] + partialShortage
    i <- i + 1
  }

  return(percentages)
}

有任何想法吗?

4

1 回答 1

0

这个怎么样?它重新调整变量,使其增加到 100,但如果由于四舍五入它达到 99,它会在最大频率上增加 1。

fractionals <- 1 / (2 ^ c(2, 5:6, 8, rep(9,358)))
pctOfCum <- fractionals / cumsum(fractionals)
fractionals <- fractionals[pctOfCum > 0.005]

bunnies <- as.integer(fractionals / sum(fractionals) * 100) + 1
    bunnies[bunnies > 1] <- round(bunnies[bunnies > 1] * (100 -  
    sum(bunnies[bunnies == 1])) / sum(bunnies[bunnies > 1]))
if((sum(bunnies) < 100) == TRUE) bunnies[1] <- bunnies[1] + 1

> bunnies
[1] 45  6  3  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  
1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
于 2014-06-26T21:03:47.433 回答