我试图在从特定人群中重新采样时引导 ZIP 估计。每个种群(集群)在某种程度上都有根本的不同,所以我想在 bootstrapping 中按比例表示它们。strata 命令将执行此操作。
我有时会遇到以下错误:
solve.default(as.matrix(fit$hessian)) 中的错误:系统在计算上是奇异的:倒数条件数 = 2.02001e-16
这是一种复制问题的方法,它应该只需要大约一分钟左右的时间来运行,具体取决于您的计算机:
#Load dependencies
library(AER)
library(boot)
library(pscl)
library(sampling)
#generate some fake data.q. Seed will be used to make it replicatable.
set.seed(1)
x1<-rpois(1000,1)
set.seed(1)
x2<-rnorm(1000,0,1)
set.seed(1)
e<-round(runif(1000,0,1)) #this should add some disruptions and prevent any multicolinearity.
pop<-rep(1:10,length.out=1000) #there are 10 populations
y<-x1*abs(floor(x2*sqrt(pop)))+e #the populations each impact the y variable somewhat differently
fake_data<-as.data.frame(cbind(y,x1,x2,pop))
fake_data$pop<-factor(pop) #they are not actually simple scalars.
#Run zip proccess, confirm it works. I understand it's not a matching model.
system.time(zip<-zeroinfl(y ~ x1+x2+pop | x1+x2+pop, data=fake_data))
#storing estimates to speed up bootstrapping phase. General technique from http://www.ats.ucla.edu/stat/r/dae/zipoisson.htm
count_hold<-as.data.frame(dput(coef(zip, "count")))
count_short<-c(count_hold[,1])
zero_hold<-as.data.frame(dput(coef(zip, "zero")))
zero_short<-c(zero_hold[,1])
#bootstrapping
f <- function(fake_data, i) {
zip_boot<- zeroinfl(y ~ x1+x2+pop | x1+x2+pop, data=fake_data[i,], start=list(count=count_short, zero=zero_short))
return(coef(zip_boot))
} #defines function for R to repeat in bootstrapping phase.
set.seed(1)
system.time(res <- boot(fake_data, f, R =50, strata=fake_data$pop)) #adjust the number of cpus to match your computer.
考虑到我有 900 多个自由度,并且每个总体中至少有 100 个样本来获取我的重采样估计值,应该有足够的样本。
我的问题:1)我做了什么导致这种多重共线性?