r - 从 R 中具有概率的两个分布中绘制

Question

我试图以 100000 次的概率从两个不同的分布中提取。不幸的是，我看不出我的 for 循环有什么问题，但是，它只添加了 1 个值，simulated_data而不是所需的 100,000 个值。

问题1：我该如何解决这个问题？

问题 2：有没有一种更有效的方法，我不必遍历列表中的 100,000 个项目？

#creating a vector of probabilities
probabilities <- rep(0.99,100000)
#creating a vector of booleans
logicals <- runif(length(probabilities)) < probabilities

#empty list for my simulated data
simulated_data <- c()

#drawing from two different distributions depending on the value in logicals
for(i in logicals){

  if (isTRUE(i)) {
    simulated_data[i] <- rnorm(n = 1, mean = 0, sd = 1)
  }else{
     simulated_data[i] <- rnorm(n = 1, mean = 0, sd = 10)
   }
}

score 1 · Accepted Answer

从每个分布中创建一个具有所需值分数的向量，然后创建值的随机排列：

N = 10000
frac =0.99
rand_mix = sample( c( rnorm( frac*N, 0, sd=1) , rnorm( (1-frac)*N, 0, sd=10) ) )

> table( abs(rand_mix) >1.96)

FALSE  TRUE 
 9364   636 
> (100000-636)/100000
[1] 0.99364

> table( rnorm(10000) >6)

FALSE 
10000

分数是固定的。如果你想要一个可能的随机分数（但在统计上接近 0.99），那么试试这个：

> table( sample( c( rnorm(10e6), rnorm(10e4, sd=10) ), 10e4) > 1.96 )

FALSE  TRUE 
97151  2849

与之比较：

> N = 100000
> frac =0.99
> rand_mix = sample( c( rnorm( frac*N, 0, sd=1) , rnorm( (1-frac)*N, 0, sd=10) ) )
> table( rand_mix > 1.96 )

FALSE  TRUE 
97117  2883

score 0 · Accepted Answer

对于这里的任何人来说，这是一个很好的解决方案：

n <- 100000
prob1 <- 0.99
prob2 <- 1-prob1 

dist1 <- rnorm(prob1*n, 0, 1)
dist2 <- rnorm(prob2*n, 0, 10)

actual_sample <- c(dist1, dist2)

score 0 · Accepted Answer

您似乎想要创建一个最终样本，其中每个元素都是从 sample1 或 sample2 中随机抽取的，概率为 0.99 和 0.01。

正确的方法是生成两个样本，每个样本都包含相同数量的元素，然后从任一样本中随机选择。

正确的方法是：

# Generate both samples
n = 100000
sample1 = rnorm(n,0,1)
sample2 = rnorm(n,0,10)

# Create the logical vector that will decide whether to take from sample 1 or 2
s1_s2 = runif(n) < 0.99

# Create the final sample
sample = ifelse(s1_s2 , sample1, sample2)

在这种情况下，不能保证 sample1 中恰好有 0.99*n 个样本和 sample2 中恰好有 0.01*n 个样本。实际上：

> sum(sample == sample1)
[1] 98953

正如预期的那样，这接近于 0.99*n，但并不完全如此。

r - 从 R 中具有概率的两个分布中绘制

3 回答 3

Related

Reference