1

我试图在从特定人群中重新采样时引导 ZIP 估计。每个种群(集群)在某种程度上都有根本的不同,所以我想在 bootstrapping 中按比例表示它们。strata 命令将执行此操作。

我有时会遇到以下错误:

solve.default(as.matrix(fit$hessian)) 中的错误:系统在计算上是奇异的:倒数条件数 = 2.02001e-16

这是一种复制问题的方法,它应该只需要大约一分钟左右的时间来运行,具体取决于您的计算机:

#Load dependencies
library(AER)
library(boot)
library(pscl)
library(sampling)

#generate some fake data.q. Seed will be used to make it replicatable.
set.seed(1) 
x1<-rpois(1000,1)
set.seed(1)  
x2<-rnorm(1000,0,1)
set.seed(1)
e<-round(runif(1000,0,1)) #this should add some disruptions and prevent any multicolinearity.
pop<-rep(1:10,length.out=1000)  #there are 10 populations
y<-x1*abs(floor(x2*sqrt(pop)))+e  #the populations each impact the y variable somewhat differently
fake_data<-as.data.frame(cbind(y,x1,x2,pop))
fake_data$pop<-factor(pop)  #they are not actually simple scalars.

#Run zip proccess, confirm it works. I understand it's not a matching model.
system.time(zip<-zeroinfl(y ~ x1+x2+pop | x1+x2+pop, data=fake_data))

#storing estimates to speed up bootstrapping phase. General technique from http://www.ats.ucla.edu/stat/r/dae/zipoisson.htm
count_hold<-as.data.frame(dput(coef(zip, "count")))
count_short<-c(count_hold[,1])
zero_hold<-as.data.frame(dput(coef(zip, "zero")))
zero_short<-c(zero_hold[,1])

#bootstrapping
f <- function(fake_data, i) {
  zip_boot<- zeroinfl(y ~ x1+x2+pop | x1+x2+pop, data=fake_data[i,], start=list(count=count_short, zero=zero_short))
  return(coef(zip_boot))
  } #defines function for R to repeat in bootstrapping phase. 

set.seed(1)  
system.time(res <- boot(fake_data, f, R =50, strata=fake_data$pop)) #adjust the number of cpus to match your computer.

考虑到我有 900 多个自由度,并且每个总体中至少有 100 个样本来获取我的重采样估计值,应该有足够的样本。

我的问题:1)我做了什么导致这种多重共线性?

4

0 回答 0