r - R - 与实数分布相似的随机数

Question

这是一个非常简化的示例，但希望它能让每个人都了解我在说什么：

real.length = c(10,11,12,13,13,13,13,14,15,50)

random.length = vector() 
for (i in 1:length(real.length)){
    random.length[i] = sample(min(real.length):max(real.length),1)
}

（注意：我知道我可以说 random.length=sample(min:max,10) 但我需要在我的真实代码中使用循环。）

我希望我的随机长度与我的实际长度有相似的范围，但也有相似的分布。我试过 rnorm 但我的真实数据没有正态分布，所以我认为这不会起作用，除非我错过了一些选项。

是否可以使用我的真实数据设置示例函数的概率？因此，在这种情况下，给出 10-15 之间数字的较高权重/概率，以及 50 等高数字的较低权重/概率。

编辑：使用詹姆斯的解决方案：

samples = length(real.length) 
d = density(real.length)
random.length = d$x[findInterval(runif(samples+100),cumsum(d$y)/sum(d$y))]
random.length = subset(random.length, random.length>0)
random.length = random.length[1:samples]

score 0 · Accepted Answer

您可以从中创建density估计值和样本：

d <- density(real.length)
d$x[findInterval(runif(6),cumsum(d$y)/sum(d$y))]
[1] 13.066019 49.591973  9.636352 15.209561 11.951377 12.808794

请注意，这假设您的变量是连续的，因此round您认为合适。

score 0 · Accepted Answer

虽然我可以阅读R，但我不能写它（我没有安装它，所以无法测试）。我将在 Matlab 中给你一个简单的例子，它会像你问的那样做 - 我希望这能激励你：

obs = sort([10 11 12 13 13 13 13 14 15 50]); % have to make sure they are sorted...
uo = unique(obs);
hh = hist(obs, uo); % find frequencies of each value
cpdf = cumsum(obs);
cpdfn = cpdf / max(cpdf); % normalized cumulative pdf
r = rand(1, 100); % 100 random numbers from 0 to 1
rv = round(interp1(cpdfn, uo, r)); % randomly pick values in the cpdfn; find corresponding "observation"
hr = hist(rv, 1:50);
hrc = cumsum(hr);
figure
plot(uo, cpdfn);
hold all;
plot(1:50, hhc/max(hhc))

figure; hist(rv, 1:50);

这会产生以下图：在此处输入图像描述

在此处输入图像描述

注意 - 当您有更多观察时，这会更好；在当前示例中，由于您的样本相对较少，因此 15 到 50 之间的空间在大约 10% 的时间内被采样。

r - R - 与实数分布相似的随机数

2 回答 2

Related

Reference