3

I have two tables one with more rows than the other. I would like to filter the rows out that both tables share. I tried the solutions proposed here.

The problem, however, is that it is a large data-set and computation takes quite a while. Is there any simple solution? I know how to extract the shared rows of both tables using:

rownames(x1)->k
rownames(x)->l
which(rownames(x1)%in%l)->o

Here x1 and x are my data frames. But this only provides me with the shared rows. How can I get the unique rows of each table to then exclude them respectively? So that I can just cbind both tables together?

4

2 回答 2

2

(我编辑了整个答案) 您可以将 df 与merge()(来自 Andrie 的评论)合并。还要检查?merge以了解您可以作为by参数输入的所有选项,0 = row.names.

下面的代码显示了一个示例,其中可能是您​​的数据框(不同的行数和列数)

x = data.frame(a1 = c(1,1,1,1,1), a2 = c(0,1,1,0,0), a3 = c(1,0,2,0,0), row.names = c('y1','y2','y3','y4','y5'))
x1 = data.frame(a4 = c(1,1,1,1), a5 = c(0,1,0,0), row.names = c('y1','y3','y4','y5'))

假设行名可以用作标识符,那么我们将它们作为新列按列合并:

x$id <- row.names(x)
x1$id <- row.names(x1)

# merge by column names
merge(x, x1, by = intersect(names(x), names(x1)))

# result
#   id a1 a2 a3 a4 a5
# 1 y1  1  0  1  1  0
# 2 y3  1  1  2  1  1
# 3 y4  1  0  0  1  0
# 4 y5  1  0  0  1  0

我希望这能解决问题。

编辑:好的,现在我觉得很傻。如果所有列在两个数据框中都有不同的名称,那么您不需要将行名称作为另一列。只需使用:

merge(x,x1, by=0)
于 2012-07-26T14:04:26.383 回答
0

如果您只想要每个数据集中不重复的行:

rownames(x1)->k
rownames(x)->l
which(k%in%l) -> o
x1.uniq <- x1[k[k != o],];
x.uniq <- x[l[l != o],];

然后你可以用 rbind 加入他们:

x2 <- rbind(x1.uniq,x.uniq);

如果您还想要重复的行,您可以添加它们:

x.repeated <- x1[o];
x2 <- rbind(x2,x.repeated);
于 2012-07-26T12:42:38.367 回答