r - 每行都有一个人口，但我想要一个随机的人

Question

假设我的数据结构如下：

      country population
1 Afghanistan   30000000
2      Brazil  200000000
3    Cameroon   22250000

这里共有 2.522 亿人代表。假设我想随机选择一个人：

i <- sample (1:sum(df$population))

然后报告她的国家。如何找到与个人 i 对应的国家行？我知道经验法则是通过数据框进行迭代意味着您做错了什么，但是（除了创建一个每个人一行的新列表，这听起来很糟糕）我想不出一个好的方法来计算出个人 i 在人口中的位置。

score 3 · Accepted Answer

正如 MrFlick 在评论中所建议的那样，您可以使用该国人口给出的概率对该国家进行抽样。

> pops <- read.table(text="country population
1 Afghanistan   30000000
2      Brazil  200000000
3    Cameroon   22250000", header=T)

> sample(pops$country, 1, prob=pops$population)

作为一个如何与总体成比例的示例，只需多次执行此操作，采样之间的比率与总体之间的比率大致相同：

> set.seed(42)
> countries <- replicate(100000, sample(pops$country, 1, prob=pops$population))
> table(countries)/sum(table(countries))
countries
Afghanistan      Brazil    Cameroon 
0.12058     0.79052     0.08890 

> pops$population/sum(pops$population)
[1] 0.11892963 0.79286422 0.08820614

另一种方法是计算人口的累积总和，从世界流行音乐中抽样，然后确定那个人的国家：

> pops$cumPop <- cumsum(pops$population)
> set.seed(42)
> person <- sample(1:pops$cumPop[nrow(pops)], 1)    
> pops$country[which(person <= pops$cumPop)[1]] #The country is the first with cumSum higher than the person ID.
[1] Cameroon
Levels: Afghanistan Brazil Cameroon

第一种选择要简单得多，但第二种选择的优点是实际抽样“某人”，以防您需要将其用于其他事情而不是返回一个国家。

r - 每行都有一个人口，但我想要一个随机的人

1 回答 1

Related

Reference