5

I'm trying to learn data.table package in R. I have a data table named DT1 and a data frame DF1, and I'd like to subset some instances according to a logical condition (disjunction). This is my code for now:

DF1[DF1$c1==0 | DF1$c2==1,] #the data.frame way with the data.frame DF1
DT1[DT1$c1==0 | DT1$c2==1,] #the data.frame way with the data.table DT1

On page 5 of "Introduction to the data.table package in R", the author gives an example of something similar but with a conjuction (replace | by & in the second line above) and remarks that's a bad use of data.table package. He suggests todo it this way instead:

setkey(DT1,c1,c2)
DT1[J(0,1)]

So, my question is: How can I write the disjunction condition with the data.table package syntax? Is it a misuse my second line DT1[DT1$c1==0 | DT1$c2==1,]? Is there an equivalent to the J but for disjunction?

4

2 回答 2

4

该文件表明您可以使用:

DT1[c1==0 | c2==1, ]
于 2012-05-21T20:11:13.297 回答
3

这是另一个解决方案:

grpsize = ceiling(1e7/26^2)
DT <- data.table(
  x=rep(LETTERS,each=26*grpsize),
  y=rep(letters,each=grpsize),
  v=runif(grpsize*26^2))

setkey(DT, x)
system.time(DT1 <- DT[x=="A" | x=="Z"])
   user  system elapsed 
   0.68    0.05    0.74 
system.time(DT2 <- DT[J(c("A", "Z"))])
   user  system elapsed 
   0.08    0.00    0.07 
all.equal(DT1[, v], DT2[, v])
TRUE

请注意,我从 data.table 文档中获取了示例。唯一的区别是我不再将字母转换为因子,因为现在允许使用字符键(请参阅v 1.8.0 的新闻)。

一个简短的解释:J只是data.table. 因此,如果您调用J(0, 1)创建data.table具有匹配的两列的 a,就像在示例中一样:

> J(0,1)
     V1 V2
[1,]  0  1

但是,您希望在一列中匹配两个不同的元素。因此,您需要data.table一列。所以只需添加c().

J(c(0,1))
     V1
[1,]  0
[2,]  1
于 2012-05-22T07:47:18.020 回答