这是一个简短的函数,它
根据具有指定值 (byVal) 的给定列 (byCol)拼接您的数据框
spliceDF <- function(df, byVal, byCol="attr", preserveField="user") {
# returns spliced df with renamed columns
# identify which rows will be returned
rows <- df[byCol]==byVal
# append the suffix
nm <- names(df)!=preserveField
names(df)[nm] <-
paste(names(df)[nm], byVal, sep="_")
return(df[rows,])
}
然后可以在merge中调用如下
# merge the two spliced data frames
merge(spliceDF(mydf, "a"), spliceDF(mydf, "b"), by="user", all=TRUE)
为清楚起见,最后一行可以分为三个单独的行
# Splice the df into two separate dfs
df_a <- spliceDF(mydf, byVal="a", byCol="attr")
df_b <- spliceDF(mydf, byVal="b", byCol="attr")
# mrege the two into one
merge(df_a, df_b, by="user", all=TRUE)
上面示例的代码
# build the data frame from your example
mydf <- data.frame(user=c(100,100,101),
attr=c("a","b","a"),
val =c(10, 20, 11),
date=c(2012-11-09,2012-11-08,2012-11-09)
)
更新:
看着?merge()
,它有一个后缀参数。
尝试 suffixes=c("_a", "_b") 效果很好。
merge(df[df$attr=="a", ], df[df$attr=="b", ],
by="user", suffixes=c("_a", "_b"), all=TRUE)
# OUTPUT
user attr_a val_a date_a attr_b val_b date_b
1 100 a 10 1992 b 20 1993
2 101 a 11 1992 <NA> NA NA