r - 数据框中行名的部分字符串替换

Question

我的问题更多是关于提高我的编码技能而不是解决问题，因为我能够找出解决方案，但我觉得它不是很优雅。

我正在研究此处发布的版本的更复杂的版本。我正在运行多个线性回归，我想将所有系数中的系数导出到单个 csv 文件。我能够使用这些信息来生成所有系数的列表并将其转换为数据框列表。我的数据框列表如下所示：

> coef.df
[[1]]
                    Estimate Std. Error    z value     Pr(>|z|)
(Intercept)      -0.08670899   0.357377 -0.2426261 0.8082950694
Var.0.0.Type.4   22.46262205   5.935317  3.7845698 0.0001539747

[[2]]
                   Estimate Std. Error    z value     Pr(>|z|)
(Intercept)      -0.1682616  0.3590799 -0.4685911 6.393619e-01
Var.0.5.Type.4   15.4974199  3.8693290  4.0051957 6.196616e-05

[[3]]
                   Estimate Std. Error    z value     Pr(>|z|)
(Intercept)      -0.1832488  0.3532577 -0.5187397 6.039423e-01
Var.1.0.Type.4   10.1225605  2.4475064  4.1358668 3.536172e-05

等等。

当我试图简单地将这个列表转换为 csv 文件时，我弄乱了列名（所有“拦截”术语都添加了一个数字）。

                   Estimate Std. Error     z value     Pr(>|z|)
(Intercept)      -0.08670899  0.3573770 -0.24262609 8.082951e-01
Deg.In.0.0.INS.4 22.46262205  5.9353171  3.78456983 1.539747e-04
(Intercept)1     -0.16826164  0.3590799 -0.46859114 6.393619e-01
Deg.In.0.5.INS.4 15.49741993  3.8693290  4.00519568 6.196616e-05
(Intercept)2     -0.18324877  0.3532577 -0.51873968 6.039423e-01
Deg.In.1.0.INS.4 10.12256045  2.4475064  4.13586682 3.536172e-05
(Intercept)3     -0.14188918  0.3426645 -0.41407607 6.788184e-01
Deg.In.1.5.INS.4  6.32348365  1.5164421  4.16994719 3.046702e-05

我知道行必须具有唯一的名称，并且我想使用每个模型的第二个系数的名称来自定义它们。我想做的是创建一个 csv 文件，该文件将包含以下格式的所有信息，并调整行名称以说明给定 Intercept 是哪个变量：

                          Estimate Std. Error    z value     Pr(>|z|)
(Intercept.0.0.Type.4)   -0.0867089   0.357377  -0.2426261 0.8082950694
Var.0.0.Type.4           22.4626220   5.935317   3.7845698 0.0001539747
(Intercept.0.5.Type.4)   -0.1682616   0.359079  -0.4685911 6.393619e-01
Var.0.5.Type.4           15.4974199   3.869329   4.0051957 6.196616e-05
(Intercept.1.0.Type.4)   -0.1832488   0.353257  -0.5187397 6.039423e-01
Var.1.0.Type.4           10.1225605   2.447506   4.1358668 3.536172e-05

我没有太多操作部分字符串替换的经验，虽然我能够这样做，但我认为我的代码并不是最直接的。以下是我获得此结果的方法：

#I created a vector containing all row names
df.names <- unlist(lapply(coef.df,rownames)) 
> df.names
 [1] "(Intercept)" "Var.0.0.INS.4" "(Intercept)" "Var.0.5.INS.4" 
 [5] "(Intercept)" "Var.1.0.INS.4" "(Intercept)" "Var.1.5.INS.4" 
 [9] "(Intercept)" "Var.0.0.INS.5" "(Intercept)" "Var.0.5.INS.5"
[13] "(Intercept)" "Var.1.0.INS.5" "(Intercept)" "Var.1.5.INS.5"
#I created a vector with all "(Intercept)" elements from df.names
inter.lm <- df.names[c(TRUE, FALSE)] 
> inter.lm
[1] "(Intercept)" "(Intercept)" "(Intercept)" "(Intercept)" "(Intercept)"
[6] "(Intercept)" "(Intercept)" "(Intercept)"
#I created a vector with all remaining elements from df.names 
var.lm <- df.names[c(FALSE,TRUE)] coefficients
> var.lm
[1] "Var.0.0.Type.4" "Var.0.5.Type.4" "Var.1.0.Type.4" "Var.1.5.Type.4" 
[5] "Var.0.0.Type.5" "Var.0.5.Type.5" "Var.1.0.Type.5" "Var.1.5.Type.5"
#I removed the "Var" part from all elements in var.lm
var.temp <- gsub("Var(.*)", "\\1", var.lm)
> var.temp
[1] ".0.0.Type.4" ".0.5.Type.4" ".1.0.Type.4" ".1.5.Type.4" ".0.0.Type.5"
[6] ".0.5.Type.5" ".1.0.Type.5" ".1.5.Type.5"
#I removed the ")" part from all elements in inter.lm
inter.temp <- gsub("\\)", "", inter.lm) 
> inter.temp
[1] "(Intercept" "(Intercept" "(Intercept" "(Intercept" "(Intercept"
[6] "(Intercept" "(Intercept" "(Intercept"
#I pasted together vectors inter.tepm and var.temp to get the required names 
inter.new <- paste(inter.temp,var.temp,")",sep="")
> inter.new
[1] "(Intercept.0.0.Type.4)" "(Intercept.0.5.Type.4)" "(Intercept.1.0.Type.4)"   
[4] "(Intercept.1.5.Type.4)" "(Intercept.0.0.Type.5)" "(Intercept.0.5.Type.5)"
[7] "(Intercept.1.0.Type.5)" "(Intercept.1.5.Type.5)"
#I merged the inter.new and var.lm vectors to get the correct naming
df.names <- c(rbind(inter.new, var.lm))
> df.names
 [1] "(Intercept.0.0.Type.4)" "Deg.In.0.0.Type.4"     
 [3] "(Intercept.0.5.Type.4)" "Deg.In.0.5.Type.4"     
 [5] "(Intercept.1.0.Type.4)" "Deg.In.1.0.Type.4"     
 [7] "(Intercept.1.5.Type.4)" "Deg.In.1.5.Type.4"  
 [9] "(Intercept.0.0.INS.5)" "Deg.In.0.0.INS.5"     
[11] "(Intercept.0.5.INS.5)" "Deg.In.0.5.INS.5"     
[13] "(Intercept.1.0.INS.5)" "Deg.In.1.0.INS.5"     
[15] "(Intercept.1.5.INS.5)" "Deg.In.1.5.INS.5"     
#Finally I changed the row names
rownames(final.df) <- df.names

有没有更简单/更短的方法来获得我想要的名字？

r - 数据框中行名的部分字符串替换

0 回答 0

Related

Reference