我正在使用 plm 处理固定效应回归模型。


FE.model <-plm(fml, data = data.reg2,
           index=c('Site.ID','date.hour'), # cross section ID and time series ID
           model='within', #coefficients are fixed


我想要做的是获取我的拟合值(我的 yhats)并将它们加入我的基础数据集;数据.reg2


 Fe.model.fitted <- FE.model$model[[1]] - FE.model$residuals

但是,这只给了我一个仅包含拟合值的列向量 - 我无法将它加入我的基础数据集。


 Fe.model.fitted <- cbind(data.reg2, resid=resid(FE.model), fitted=fitted(FE.model))


 Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ""pseries"" to a data.frame


我应该注意,我不想根据我的 beta 手动计算 yhats。我对该选项有太多的自变量,并且我定义的公式(fml)可能会改变,因此该选项不会有效。



将拟合值合并plm回原始数据集需要一些中间步骤——plm删除任何缺少数据的行,据我所知,一个plm对象不包含索引信息。数据的顺序没有plm保留——请参阅的作者之一 Giovanni Millo在此线程中的评论:



  1. 从估计的plm对象中获取拟合值。它是单个向量,但条目已命名。名称对应于索引中的位置。
  2. index()使用函数获取索引。它可以返回单个索引和时间索引。请注意,索引可能包含比拟合值更多的行,以防因缺少数据而删除行。(也可以直接从原始数据生成索引,但我没有看到数据的原始顺序保留在plm返回的内容中的承诺。)
  3. 合并到原始数据中,从索引中查找 id 和 time 值。

下面提供了示例代码。有点长,但我试图发表评论。代码没有优化,我的意图是明确列出这些步骤。另外,我使用的是data.tables 而不是data.frames。

library(data.table); library(plm)

### Generate dummy data. This way we know the "true" coefficients
n <- 500 # Run with more data if you want to get closer to the "true" coefficients
DT <- data.table(CJ(id = c("a","b","c","d","e"), time = c(1:(n / 5))))
DT[, x1 := rnorm(n)]
DT[, x2 := rnorm(n)]
DT[, y  := x1 + 2 * x2 + rnorm(n) / 10]

setkey(DT, id, time)
# # Make it an unbalanced panel & put in some NAs
DT <- DT[!(id == "a" & time == 4)]
DT[.("a", 3), x2 := as.numeric(NA)]
DT[.("d", 2), x2 := as.numeric(NA)]


### Run the model -- both individual and time effects; "within" model
summary(PLM <- plm(data = DT, id = c("id", "time"), formula = y ~ x1 + x2, model = "within", effect = "twoways", na.action = "na.omit"))

### Merge the fitted values back into the data.table DT
# Note that PLM$model$y is shorter than the data, i.e. the row(s) with NA have been dropped
cat("\nRows omitted (due to NA): ", nrow(DT) - length(PLM$model$y))

# Since the objects returned by plm() do not contain the index, need to generate it from the data
# The object returned by plm(), i.e. PLM$model$y, has names that point to the place in the index
# Note: The index can also be done as INDEX <- DT[, j = .(id, time)], but use the longer way with index() in case plm does not preserve the order
INDEX <- data.table(index(x = pdata.frame(x = DT, index = c("id", "time")), which = NULL)) # which = NULL extracts both the individual and time indexes
INDEX[, id := as.character(id)]
INDEX[, time := as.integer(time)] # it is returned as a factor, convert back to integer to match the variable type in DT

# Generate the fitted values as the difference between the y values and the residuals
if (all(names(PLM$residuals) == names(PLM$model$y))) { # this should not be needed, but just in case...
    FIT <- data.table(
        index   = as.integer(names(PLM$model$y)), # this index corresponds to the position in the INDEX, from where we get the "id" and "time" below
        fit.plm = as.numeric(PLM$model$y) - as.numeric(PLM$residuals)

FIT[, id   := INDEX[index]$id]
FIT[, time := INDEX[index]$time]
# Now FIT has both the id and time variables, can match it back into the original dataset (i.e. we have the missing data accounted for)
DT <- merge(x = DT, y = FIT[, j = .(id, time, fit.plm)], by = c("id", "time"), all = TRUE) # Need all = TRUE, or some data from DT will be dropped!
The residuals are deviation of the model from the value on the LHS of the formula .... which you have not shown to us. There is a fitted.panelmodel function in the 'plm' package, but it appears to expect that there will be a fitted value which the plm function does not return by default, nor is it documented to do so, nor is the a way that I see to make it cough one up.

data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
          data = Produc, index = c("state","year"))
summary(zz)  # the example on the plm page:
> str(fitted(zz))
> names(zz$model)
[1] "log(gsp)"  "log(pcap)" "log(pc)"   "log(emp)"  "unemp"    
> Produc[ , c("Yvar", "Fitted")] <- cbind( zz$model[ ,"log(gsp)", drop=FALSE], zz$residuals)
> str(Produc)
'data.frame':   816 obs. of  12 variables:
 $ state : Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ year  : int  1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 ...
 $ pcap  : num  15033 15502 15972 16406 16763 ...
 $ hwy   : num  7326 7526 7765 7908 8026 ...
 $ water : num  1656 1721 1765 1742 1735 ...
 $ util  : num  6051 6255 6442 6756 7002 ...
 $ pc    : num  35794 37300 38670 40084 42057 ...
 $ gsp   : int  28418 29375 31303 33430 33749 33604 35764 37463 39964 40979 ...
 $ emp   : num  1010 1022 1072 1136 1170 ...
 $ unemp : num  4.7 5.2 4.7 3.9 5.5 7.7 6.8 7.4 6.3 7.1 ...
 $ Yvar  :Classes 'pseries', 'pseries', 'integer'  atomic [1:816] 10.3 10.3 10.4 10.4 10.4 ...
  .. ..- attr(*, "index")='data.frame': 816 obs. of  2 variables:
  .. .. ..$ state: Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ year : Factor w/ 17 levels "1970","1971",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Fitted: num  -0.04656 -0.03064 -0.01645 -0.00873 -0.02708 ...
1) pdata.frames 按名称的字母顺序对您的输入进行排序,然后是年份。这可以通过在运行 plm 之前先对数据框进行排序来解决。

2) 删除公式中包含的变量中具有 NA 的行。我通过创建包含我的 id 和时间变量的第二个公式来处理这个问题,然后使用 model.frame 提取回归中使用的数据(不包括 NA,但现在还包括 id 和时间)

n <- 10 # Run with more data if you want to get closer to the "true" coefficients
DT <- data.frame(id = c("a","c","b","d","e"), time = c(1:(n / 5)),x1 = rnorm(n),x2= rnorm(n),x3=rnorm(n))
DT$Y = DT$x2 + 2 * DT$x3 + rnorm(n) / 10 # make x1 a function of other variables
DT$x3[3]=NA  # add an NA to show this works with missing data 

# now can add drop.index = F, but note that DT is now sorted by order(id,time)
pdata.frame(DT,index=c('id','time'),drop.index = F)

# order DT to match pdata.frame that will be used for plm

# formulas
formulas =Y~x1+x2+x3 
formulas_dataframe = Y~x1+x2+x3 +id+time # add id and time for model.frame

# estimate
random <- plm(formulas, data=DT, index=c("id", "time"), model="random",na.action = 'na.omit')

# merge prediction and and model.frame 
fitted = data.frame(fitted = random$model[[1]] - random$residuals)
model_data = cbind(as.data.frame(as.matrix(random$model)),fitted)  # this isn't really needed but shows that input and model.frame are same
model_data = cbind(model_data,na.omit(model.frame(formulas_dataframe,DT)))  
predict.out.plm在用估计一阶差分或固定效应模型后,plm我编写了一个函数 (




Fe.model.fitted <- cbind(FE.model$model, 


