我有五个 data.frames,其中包含不同样本集的基因表达数据。我在每个 data.set 中有不同数量的行,因此只有部分重叠 row.names(基因)。
现在我想要 a) 过滤五个 data.frame 以仅包含所有 data.frame 中存在的基因和 b) 将这些基因的基因表达数据组合到一个 data.frame 中。
到目前为止,我只能找到合并,但这只能合并两个 data.frame,所以我必须多次使用它。有没有更简单的方法?
Merging is not very efficient if you want to exclude row names which are not present in every data frame. Here's a different proposal.
First, three example data frames:
df1 <- data.frame(a = 1:5, b = 1:5,
row.names = letters[1:5]) # letters a to e
df2 <- data.frame(a = 1:5, b = 1:5,
row.names = letters[3:7]) # letters c to g
df3 <- data.frame(a = 1:5, b = 1:5,
row.names = letters[c(1,2,3,5,7)]) # letters a, b, c, e, and g
# row names being present in all data frames: c and e
Put the data frames into a list:
dfList <- list(df1, df2, df3)
Find common row names:
idx <- Reduce(intersect, lapply(dfList, rownames))
Extract data:
df1[idx, ]
a b
c 3 3
e 5 5
PS. If you want to keep the corresponding rows from all data frames, you could replace the last step, df1[idx, ]
, with the following command:
do.call(rbind, lapply(dfList, "[", idx, ))
查看此 SO 帖子中最重要的答案。只需列出您的数据框并应用以下代码行:
Reduce(function(...) merge(..., by = "x"), list.of.dataframes)
您只需调整by
参数以指定数据框应通过哪个公共列合并。