r - 如何根据因子水平将因子变量的回归系数分配给新变量？

Question

我是 R 的新手。在使用分类变量“销售年份”进行线性回归之后

ols <- lm(logprice = x + factor(city) + factor(sale_year))

我想创建一个新变量，它告诉我每次观察的因素（sale_year）在该观察的 sale_year 上的回归系数。

     sale_year            new variable
     1980     coef(ols)["factor(sale_year)1980"]
     1973     coef(ols)["factor(sale_year)1973"]
     1990     coef(ols)["factor(sale_year)1990"]
     1990     coef(ols)["factor(sale_year)1990"]
     1973     coef(ols)["factor(sale_year)1973"]

      ...

如果没有其他因子变量，那么我可以简单地将除 sale_year 之外的所有变量设置为零，并用于predict.lm获取系数。但是考虑到多个因素变量，它会更混乱，而且我无法在 R 中正确使用它。

在 Stata 中，我可以这样做：

xi: reg logprice x i.city i.sale_year 
gen newvar = .
levelsof sale_year, local(saleyr)
foreach lv of local saleyr {
    replace newvar = _b[_Isaleyr`lv'] if sale_year == `lv'
}

我怎样才能在 R 中做到这一点？谢谢！

score 2 · Accepted Answer

由于您没有提供示例数据，我将使用 R 中的 iris 数据：

mydata<-iris
mydata$Petal.Width<-as.factor(mydata$Petal.Width)
str(mydata)
 str(mydata)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : Factor w/ 22 levels "0.1","0.2","0.3",..: 2 2 2 2 2 4 3 2 2 1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
myreg<-lm(Sepal.Length~Sepal.Width+Petal.Width+Species,data=mydata)
k<-length(levels(mydata$Petal.Width))
mycoef<-coef(myreg)[3:(k+1)]
mycoef<-data.frame(mycoef)
> head((mycoef)
                   mycoef
Petal.Width0.2 0.13981323
Petal.Width0.3 0.17193663
Petal.Width0.4 0.20220902
Petal.Width0.5 0.31915175
Petal.Width0.6 0.08864592

mycoef$var<-rownames(mycoef)
rownames(mycoef)<-1:dim(mycoef)[1]
mycoef[,c("var","mycoef")]



mycoef[,c("var","mycoef")]
              var     mycoef
1  Petal.Width0.2 0.13981323
2  Petal.Width0.3 0.17193663
3  Petal.Width0.4 0.20220902
4  Petal.Width0.5 0.31915175

更新：

mycoef$var1<-substring(mycoef$var,12,15)
myout<-merge(mydata1,mycoeff,by.x="Petal.Width",by.y="var1")
> head(myout)
  Petal.Width Sepal.Length Sepal.Width Petal.Length Species            var    mycoef
1         0.2          4.9         3.0          1.4  setosa Petal.Width0.2 0.1398132
2         0.2          4.7         3.2          1.3  setosa Petal.Width0.2 0.1398132
3         0.2          4.6         3.1          1.5  setosa Petal.Width0.2 0.1398132
4         0.2          5.0         3.6          1.4  setosa Petal.Width0.2 0.1398132
5         0.2          5.1         3.5          1.4  setosa Petal.Width0.2 0.1398132
6         0.2          5.4         3.7          1.5  setosa Petal.Width0.2 0.1398132

score 0 · Accepted Answer

您仍然需要使用predict.lm来获取因子的第一个级别的基线值，因为该级别将没有系数（或者更确切地说它将是 0）。所有其他系数实际上是该值的偏移量（假设 predict 的结果是您所期望的），因此类似于：

  faclev1 <- predict(old, list(x=mean(x), city=levels(city)[1], sale_year =levels(sale_year)[1])
  otherlevs <- faclev1 + coef(ols)[grep("sale_year", names(coef(ols) ) )]

对于匹配个别情况的系数向量：

 fac_coef <- c(0, coef(ols)[grep("sale_year", names(coef(ols) ) )]
 fac_coef[ as.numeric(sale_year) ]

这是有效的，因为级别的顺序与显示系数的顺序相同，而数值决定了级别通常如何显示。

r - 如何根据因子水平将因子变量的回归系数分配给新变量？

2 回答 2

Related

Reference