1

我正在尝试识别数据框中的行,这些行是完全匹配的,但两列除外(即我想从匹配的考虑中排除两列)。下面是一个示例数据框。

x = c(1,2,3,4,5,6,7,1)
y = c(1,3,4,5,6,7,8,1)
z = c(1,4,5,6,7,8,9,1)
year = 1990:1997
day = c("mon","tues","wed","thurs","fri","sat","sun","sat")
data = data.frame(cbind(x,y,z,year,day))

在上面的示例中,如果我们不考虑年和日的列,则第 1 行和第 8 行是匹配的。有没有允许我这样做的功能?我查看了函数 duplicated() 和 match(),但它们似乎不太合适。

4

2 回答 2

3

这似乎可以满足您的要求(修改了示例数据以进行更好的测试):

x = c(1,2,3,4,5,6,7,2,1,1,1)
y = c(1,3,4,5,6,7,8,1,2,1,1)
z = c(1,4,5,6,7,8,9,1,1,2,1)
year = 1990:2000
day = c("mon","tues","wed","thurs","fri","sat","sun","sat","sat","sun","mon")
data = data.frame(cbind(x,y,z,year,day))

which(apply(data[,1:3], 1, function(x){all(tail(duplicated(x), -1))}))
# [1]  1 11
于 2013-07-11T18:55:52.353 回答
2

这是我的方法分两步,一旦您知道要排除的列(此处为第 4 列和第 5 列)。

ind <- which(duplicated(data[,1:3], fromLast = TRUE) |
    duplicated(data[,1:3], fromLast = FALSE))
ind
## [1] 1 8

data[ind, ]
##    x y z year day
## 1 1 1 1 1990 mon
## 8 1 1 1 1997 sat
于 2013-07-11T19:05:53.713 回答