1

我有一个 data.frame (Data) 和这个 data.frame (Data2) 的一个子集

set.seed(1)
Data <- data.frame(id = seq(1, 10), 
  Diag1 = sample(c("A123", "B123", "C123"), 10, replace = TRUE), 
  Diag2 = sample(c("D123", "E123", "F123"), 10, replace = TRUE), 
  Diag3 = sample(c("G123", "H123", "I123"), 10, replace = TRUE), 
  Diag4 = sample(c("A123", "B123", "C123"), 10, replace = TRUE), 
  Diag5 = sample(c("J123", "K123", "L123"), 10, replace = TRUE), 
  Diag6 = sample(c("M123", "N123", "O123"), 10, replace = TRUE), 
  Diag7 = sample(c("P123", "Q123", "R123"), 10, replace = TRUE))

Data2 <- Data[1:4,]

如何获得两个 data.frames 的“差异”?我正在寻找在 Data 中但不在 Data2 中的行。

我认为像这样的 Data[!Data2] 应该有效,但它没有。

谢谢!

4

3 回答 3

5

我认为您data.tabledata.frame. 这应该可以代替-

library(data.table)
Data <- data.table(Data)
Data2 <- data.table(Data2)

setkeyv(Data,colnames(Data))
setkeyv(Data2,colnames(Data2))

Data[!Data2]
于 2013-10-22T16:39:50.747 回答
4

data.table 键是你的(最好的!)朋友

library(data.table)

Data  <- as.data.table(Data)
Data2 <- as.data.table(Data2)

## set whichever cols make sense as keys
setkey(Data, Diag1, Diag2, Diag3)  
## or to set all columns as key, use  
#  setkey(Data)

## Same key for Data2
setkey(Data2, Diag1, Diag2, Diag3)  
## or 
# setkeyv(Data2, key(Data))  # <~ Note: Use setkeyv for strings


Data[!.(Data2)]

   id Diag1 Diag2 Diag3 Diag4 Diag5 Diag6 Diag7
1:  5  A123  F123  G123  C123  K123  M123  Q123
2: 10  A123  F123  H123  B123  L123  N123  R123
3:  9  B123  E123  I123  C123  L123  N123  P123
4:  6  C123  E123  H123  C123  L123  M123  P123
5:  7  C123  F123  G123  C123  J123  M123  Q123
于 2013-10-22T16:41:01.147 回答
1

这将在这里解决您的确切问题,但它可能可以使用count函数从plyr

library(plyr)
df <- as.data.frame(rbind(Data, Data2)) # rbind data sets
df <- count(df, vars = names(df))       # count frequency of rows
subset(df, freq < 2)                    # subset the data.frame when freq < 2
于 2013-10-22T23:26:06.997 回答