3

我目前正在处理来自实验的一些数据。因此,我有一些被随机分配到 2 种不同治疗方法的人的数据。对于每次治疗,我们进行了三个疗程。在每次会议中,参与者被要求做出一系列决定。

我想做的是:(1)用一个模型估计治疗的效果,该模型包括对个体的随机效应,然后,(2)按会话对标准误差进行聚类。

在 R 中,我可以使用plm包轻松估计随机效应模型:

model.plm<-plm(formula=DependentVar~TreatmentVar+SomeIndependentVars,data=data,
    model="random",effect="individual")

我的问题是我无法通过变量会话(即个人参与的会话)对标准误差进行聚类。确实,plm 包的稳健协方差矩阵估计器让我可以在两种类型的集群之间进行选择:“”和“时间”。因此,如果我选择“组”选项,我会在个人级别获得标准错误:

vcovHC(model.plm,type="HC0",cluster="group")

有没有办法选择不同的聚类变量?

我将非常感谢您的帮助。

4

1 回答 1

1

您可能对此感兴趣: https ://stats.stackexchange.com/questions/85909/why-does-a-fixed-effect-ols-need-unique-time-elements

这是我对“内部”模型的解决方案:

  #' Fixed effect cluster regression, estimated efficiently using plm()
  #' @param form The model formula.
  #' @param data The data.frame.
  #' @param index Character vector giving the column name indexing individual units.
  #' @param cluster Character vector giving the column name indexing clusters, or "robust" to avoid the bootstrap and just return robust SE.
  #' @param param A list of control parameters, with named elements as follows:  R is the number of bootstrap replicates. 
  #' @return Coefficients plus clustered standard errors
  feClusterRegress <- function( form, data, index, cluster = "robust", param = list( R = 30 ) ) {
    if( "data.table" %in% class(data) )  data <- as.data.frame(data) # Not ideal efficiency-wise since I re-convert it later but necessary until I generalize the code to data.tables (the plm call doesn't work with them, for instance)
    stopifnot( class(form)=="formula" )
    mdl <- plm( form, data = data, model = "within", effect="individual", index = index )
    if( cluster=="robust" ) {
      res <- summary( mdl, robust=TRUE )
    } else { # Bootstrap
      require(foreach)
      require(data.table)
      # Prepare data structures for efficient sampling
      clusters <- unique( data[[cluster]] )
      if( is.null(clusters) )  stop("cluster must describe a column name that exists!")
      clusterList <- lapply( clusters, function(x) which( data[[cluster]] == x ) )
      names(clusterList) <- clusters
      progressBar <- txtProgressBar( 0, param$R )
      # Convert to data.table and drop extraneous variables
      data <- as.data.table( data[ , c( all.vars(form), index ) ] ) # For faster sub-setting
      # Sample repeatedly
      coefList <- foreach( i = seq( param$R ) ) %dopar% {
        setTxtProgressBar( progressBar, i )
        clusterSample <- sample( as.character( clusters ), replace=TRUE )
        indexSample <- unlist( clusterList[ clusterSample ], use.names=FALSE )
        dataSample <- data[ indexSample, ]
        dataSample[ , fakeTime := seq(.N), by = index ] # fakeTime is necessary due to a potential bug in plm.  See https://stats.stackexchange.com/questions/85909/why-does-a-fixed-effect-ols-need-unique-time-elements
        try( coefficients( plm( form, data = as.data.frame(dataSample), model = "within", effect="individual", index = c( index, "fakeTime") ) ) )
      }
      failed <- vapply( coefList, function(x) class(x) == "try-error", FUN.VALUE=NA )
      if( any(failed) ) {
        warning( "Some runs of the regression function failed" )
        coefList <- coefList[ !failed ]
      }
      coefMat <- stack( coefList )
      SE <- apply( coefMat, 2, sd )
      res <- structure( 
        list( 
          cbind( coefficients( mdl ), SE ),
          model = mdl
        ),
        class = "feClusterPLM",
        R = param$R
      )
    }
    res         
  }

我怀疑您实际上需要变量,因此不要生成假时间,而是生成一个“假”组——只需在您获取每个引导样本后立即组成一个新的组标识符。

于 2014-03-04T22:58:37.000 回答