r - r中的递归采样

Question

我正在尝试使用以下累积概率来模拟 7 年内的死亡：

tab <- data.frame(id=1:1000,char=rnorm(1000,7,4))

cum.prob <- c(0.05,0.07,0.08,0.09,0.1,0.11,0.12)

如何tab$id根据中的累积概率以矢量化方式从不替换的情况下进行采样cum.prob？从第 1 年采样的 id 不一定在第 2 年再次采样。因此，这lapply(cum.prob,function(x) sample(tab$id,x*1000))将不起作用。是否可以将其矢量化？

//M

score 7 · Accepted Answer

这里有一种方法：首先将给定个人在给定年份死亡的概率设为probYrDeath，即probYrDeath[i] = Prob( individual dies in year i )，其中i=1,2,...,7。

probYrDeath <- c(diff(c(0,cum.prob)).

现在根据中的概率，从序列 1:8 中生成一个 1000 个“死亡年”的随机样本，并加上在probYrDeath第 7 年没有死亡的概率：

set.seed(1) ## for reproducibility
tab$DeathYr <- sample( 8, 1000, replace = TRUE, 
                       prob = c(probYrDeath, 1-sum(probYrDeath)))

我们将“'DeathYr = 8'”解释为“在 7 年内不会死亡”，并提取tabwhere的子集DeathYr != 8：

tab_sample <- subset(tab, DeathYr != 8 )

您可以验证每年的累计死亡比例是否接近于中的值cum.prob：

> cumsum(table(tab_sample$DeathYr)/1000)
    1     2     3     4     5     6     7 
0.045 0.071 0.080 0.094 0.105 0.115 0.124

score 0 · Accepted Answer

这对你有用吗：

prob.death.per.year<-c(1-cum.prob[length(cum.prob)], cum.prob - c(0, cum.prob[-length(cum.prob)]))
dead.in.years<-as.vector(rmultinom(1, length(tab$id),prob.death.per.year))[-1]
totsamp<-sum(dead.in.years)
data.frame(id=sample(tab$id, totsamp), dead.after=rep(seq_along(dead.in.years), dead.in.years))

根据您希望结果的形式，您可以更改最后一步。

r - r中的递归采样

2 回答 2

Related

Reference