假设你data.frame
被称为“mydf”,定义如下:
mydf <- structure(list(V1 = c("Assembly.1000", "Assembly.1000",
"Assembly.1000", "Assembly.1038", "Assembly.1338", "Assembly.1338"),
V2 = c("chrX", "chrX", "chrX", "chrX", "chrX", "chrX"),
V3 = c(560000L, 560000L, 560000L, 780000L, 960000L, 960000L),
V4 = c(575000L, 575000L, 575000L, 829000L, 999000L, 999000L),
V5 = c("ABC1", "IL15RA", "BRCA1", ".", "ACTIN", "ACTIN"),
V6 = c("20", "3.2", "20", ".", "3800", "4000")),
.Names = c("V1", "V2", "V3", "V4", "V5", "V6"),
class = "data.frame", row.names = c(NA, -6L))
mydf
# V1 V2 V3 V4 V5 V6
# 1 Assembly.1000 chrX 560000 575000 ABC1 20
# 2 Assembly.1000 chrX 560000 575000 IL15RA 3.2
# 3 Assembly.1000 chrX 560000 575000 BRCA1 20
# 4 Assembly.1038 chrX 780000 829000 . .
# 5 Assembly.1338 chrX 960000 999000 ACTIN 3800
# 6 Assembly.1338 chrX 960000 999000 ACTIN 4000
这是aggregate
方法:
aggregate(cbind(V5, V6) ~ ., mydf, paste, collapse = "; ")
# V1 V2 V3 V4 V5 V6
# 1 Assembly.1000 chrX 560000 575000 ABC1; IL15RA; BRCA1 20; 3.2; 20
# 2 Assembly.1038 chrX 780000 829000 . .
# 3 Assembly.1338 chrX 960000 999000 ACTIN; ACTIN 3800; 4000
这是“data.table”方法,使用相同的“mydf”作为起点:
library(data.table)
DT <- data.table(mydf)
DT[, lapply(.SD, paste, collapse = "; "), by = c("V1", "V2", "V3", "V4")]
# V1 V2 V3 V4 V5 V6
# 1: Assembly.1000 chrX 560000 575000 ABC1; IL15RA; BRCA1 20; 3.2; 20
# 2: Assembly.1038 chrX 780000 829000 . .
# 3: Assembly.1338 chrX 960000 999000 ACTIN; ACTIN 3800; 4000