r - 计算、列排列和选择“在”数据框内

Question

我经常有一个来自某些计算的数据框，我想在输出之前对其进行清理、重命名和列排列。以下所有版本都可以使用，最简单的data.frame是最接近的。

有没有办法将和的数据帧内计算within与mutate的列顺序保存相结合data.frame()，而最后没有额外的和冗余的 [,....]？

library(plyr) 

# Given this chaotically named data.frame
d = expand.grid(VISIT=as.factor(1:2),Biochem=letters[1:2],time=1:5,
                subj=as.factor(1:3))
d$Value1 =round(rnorm(nrow(d)),2)
d$val2 = round(rnorm(nrow(d)),2)

# I would like to cleanup, compute and rearrange columns

# Simple and almost perfect
dDataframe = with(d, data.frame(
  biochem = Biochem,
  subj = subj,
  visit = VISIT,
  value1 = Value1*3 
))
# This simple solution is almost perfect, 
# but requires one more line
dDataframe$value2 = dDataframe$value1*d$val2

# For the following methods I have to reorder 
# and select in a second step

# use mutate from plyr to allow computation on computed values,
# which transform cannot do.
dMutate =   mutate(d,
  biochem = Biochem,
  subj = subj,
  visit = VISIT,
  value1 = Value1*3, #assume this is a time consuming function
  value2 = value1*val2
  # Could set fields = NULL here to remove,
  # but this does not help getting column order
)[,c("biochem","subj","visit","value1","value2")]

# use within. Same problem, order not preserved
dWithin = within(d, {
  biochem = Biochem
  subj = subj
  visit = VISIT
  value1 = Value1*3
  value2 = value1*val2       
})[,c("biochem","subj","visit","value1","value2")]


all.equal(dDataframe,dWithin)
all.equal(dDataframe,dMutate)

score 2 · Accepted Answer

如果您愿意移至data.table，那么您可以通过引用执行（大部分）这些操作，并避免与[<-.data.frame和相关的复制$<-.data.frame

setnames将重命名一个data.table. setcolorder将重新排序 adata.table并将:=通过引用分配。

library(data.table)
DT <- data.table(d)
# rename to lowercase only
setnames(DT, old = names(DT), new = tolower(names(DT))
# reassign using `:=`
# note the use of `value1<-value1` to allow later use. 
# This will not be necessary once FR1492 has been implemented
# setting to NULL removes these columns
DT[, `:=`(value1 =value1<- value1*3, 
         value2  = value1 * val2, 
         val2 = NULL, time = NULL )]
setcolorder(DT, c("biochem","subj","visit","value1","value2"))

如果您不太关心内存效率，而只想使用data.table语法，那么

DT <- data.table(d)
DT[,list(  biochem = Biochem,   
    subj    = subj,
   visit   = VISIT,
   value1 = value1  <- Value1 * 3,
   value2  = value1 * val2       
   )]

将工作。

score 2 · Accepted Answer

您可以使用包中的summarize（或summarise）plyr。从文档：

Summarize 以类似的方式进行转换，除了不是向现有数据框添加列，而是创建一个新数据框。[...]

对于您的示例：

library(plyr)
summarize(d,
  biochem = Biochem,
  subj    = subj,
  visit   = VISIT,
  value1  = Value1 * 3,
  value2  = value1 * val2       
)

r - 计算、列排列和选择“在”数据框内

2 回答 2

Related

Reference