1

我有五个 data.frames,其中包含不同样本集的基因表达数据。我在每个 data.set 中有不同数量的行,因此只有部分重叠 row.names(基因)。

现在我想要 a) 过滤五个 data.frame 以仅包含所有 data.frame 中存在的基因和 b) 将这些基因的基因表达数据组合到一个 data.frame 中。

到目前为止,我只能找到合并,但这只能合并两个 data.frame,所以我必须多次使用它。有没有更简单的方法?

4

2 回答 2

5

Merging is not very efficient if you want to exclude row names which are not present in every data frame. Here's a different proposal.

First, three example data frames:

df1 <- data.frame(a = 1:5, b = 1:5, 
                  row.names = letters[1:5]) # letters a to e
df2 <- data.frame(a = 1:5, b = 1:5, 
                  row.names = letters[3:7]) # letters c to g
df3 <- data.frame(a = 1:5, b = 1:5, 
                  row.names = letters[c(1,2,3,5,7)]) # letters a, b, c, e, and g
# row names being present in all data frames: c and e

Put the data frames into a list:

dfList <- list(df1, df2, df3)

Find common row names:

idx <- Reduce(intersect, lapply(dfList, rownames))

Extract data:

df1[idx, ]

  a b
c 3 3
e 5 5

PS. If you want to keep the corresponding rows from all data frames, you could replace the last step, df1[idx, ], with the following command:

do.call(rbind, lapply(dfList, "[", idx, ))
于 2013-05-29T08:33:07.417 回答
0

查看此 SO 帖子中最重要的答案。只需列出您的数据框并应用以下代码行:

Reduce(function(...) merge(..., by = "x"), list.of.dataframes)

您只需调整by参数以指定数据框应通过哪个公共列合并。

于 2013-05-29T08:11:17.163 回答