15

This is the first time I post to this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!

I am trying to get the 95% confidence interval (CI) for an interaction (that is my test statistic) by doing bootstrapping. I am using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine. As you can see, I have two within-subjects factors ("Num" and "Gram" and I am interested in the interaction between both):

Subject = rep(c("S1","S2","S3","S4"),4)
Num     = rep(c("singular","plural"),8) 
Gram    = rep(c("gram","gram","ungram","ungram"),4)
RT      = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
data    = data.frame(Subject,Num,Gram,RT) 

This is the code I used to get the empirical interaction value:

summary(lm(RT ~ Num*Gram, data=data))

As you can see, the interaction between my two factors is -348. I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:

# You need the following packages
install.packages("car") 
install.packages("MASS")
install.packages("boot")
library("car")
library("MASS")
library("boot")

#Function to create the statistic to be boostrapped
boot.huber <- function(data, indices) {
data <- data[indices, ] #select obs. in bootstrap sample
mod <- lm(RT ~ Num*Gram, data=data)
coefficients(mod)       #return coefficient vector
}

#Generate bootstrap estimate
data.boot <- boot(data, boot.huber, 1999)

#Get confidence interval
boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction

My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1, not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)

Does anyone know how I could make sure that the resampling procedure used by "boot" respects subject level information?

Thanks a lot for your help/advice!

4

1 回答 1

12

只需将您的调用修改为boot()

data.boot <- boot(data, boot.huber, 1999, strata=data$Subject)

?boot提供了这个strata=参数的描述,它完全符合您的要求:

strata:一个整数向量或因子,指定多样本问题的层。这可以为任何模拟指定,但在 'sim = "parametric"' 时被忽略。当为非参数引导程序提供“层”时,模拟在指定的层内完成。


附加说明:

要确认它是否按照您的意愿工作,您可以调用debugonce(boot),运行上面的调用,并逐步执行调试器,直到已分配对象i(其行包含用于重新采样行以创建每个引导重新采样的索引),并且data然后看看它。

debugonce(boot)
data.boot <- boot(data, boot.huber, 1999, strata=data$Subject)
# Browse[2]>
## [Press return 34 times]
# Browse[2]> head(i)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
# [1,]    9   10   11   16    9   14   15   16    9     2    15    16     1    10
# [2,]    9   14    7   12    5    6   15    4   13     6    11    16    13     6
# [3,]    5   10   15   16    9    6    3    4    1     2    15    12     5     6
# [4,]    5   10   11    4    9    6   15   16    9    14    11    16     5     2
# [5,]    5   10    3    4    1   10   15   16    9     6     3     8    13    14
# [6,]   13   10    3   12    5   10    3    4    5    14     7    16     5    14
#      [,15] [,16]
# [1,]     7     8
# [2,]    11    16
# [3,]     3    16
# [4,]     3     8
# [5,]     7     8
# [6,]     7    12

(您可以随时进入Q离开调试器。)

于 2013-07-04T14:58:29.833 回答