r - Calculating standard deviation of samples with boostrapping in R

Question

Imagine: I have sampled 10,000 humans and measured their height in cm, and drawn the distribution as follows:

# Generate sample data
sampleSize = 10000
sampleData = round(rnorm(n=sampleSize, mean=175, sd=14))

# Draw histogram of sample
h = hist(sampleData, breaks=max(sampleData)-min(sampleData))

######################################################################
# Calculate the mean of the measurement
meanMeasure = mean(sampleData)
meanMeasure
abline(v=meanMeasure, col="red")

# Calculate the standard deviation of the measurement
sdMeasure = sd(sampleData)
sdMeasure
rect(
    xleft=meanMeasure-sdMeasure,
    ybottom=min(h$counts),
    xright=meanMeasure+sdMeasure,
    ytop=max(h$counts),
    col="#0000ff22"
)

Now I want to estimate how large the standardDeviation is for each measured body height. I thought that bootstrapping my original dataset would be a good method, i.e sampling body sizes from my original dataset with replacement.

Is this a good method? How can I perform this analysis in R (e.g. standard deviation for each height in a bootstrap analysis with 1000 cycles)?

score 2 · Accepted Answer

如果您只测量每个人一次，则无法获得“每个测量的身高”的标准偏差。仅当您有多个要获得估计的数据点时，才能使用自举法。

为了获得“对于每个测量的身高”的标准偏差，每个身高都必须测量一次以上。

但是，如果您想获得整体样本标准差的自举估计，则适用其他两个答案。

此外，这个问题更适合crossvalidated.com。

score 1 · Accepted Answer

当您的样本量如此之大时，完全没有必要为此目的使用自举。如果您想知道只有 100 或 200 甚至 500 个人的样本中标准偏差的合理变化程度，那么自举将提供信息。但是对于 10,000 个人，标准偏差的自举变化将非常非常小。

score 1 · Accepted Answer

Bootstrapping 通常用于计算估计量的方差，在您的情况下，是样本平均高度。当您只是想找出人们身高的差异时，您不需要进行引导。

我们为什么要引导？因为对于我们的一个样本，我们只有一个样本均值。因此，我们需要许多样本来获得许多样本均值来计算该估计量的方差。当我们只有一个伪样本时，Bootstrapping 是一种获取许多伪样本的方法。

在您的情况下，我们已经对高度进行了许多单独的观察，因此我们不再需要 - 我们可以直接根据我们的“真实”观察计算方差。

r - Calculating standard deviation of samples with boostrapping in R

3 回答 3

Related

Reference