1

我有两个数据框 (db1db2),我想在db2匹配的一些参数中获取位置db1。这可以使用for如下循环来实现:

db1 <- data.frame(id=rep(1:4,each=4),
                  class=sample(1:10, 16, replace=TRUE),
                  var=rnorm(16)
                  )
db2 <- expand.grid(id=1:4, class=1:10)
db2$x <- rnorm(nrow(db2))

for(i in 1:nrow(db1)) print(which(db2$id==db1$id[i] & db2$class==db1$class[i]))

但是循环效率很低,所以我想对这个循环进行矢量化。可以将向量传递给which()函数,以便该函数在 db2 中搜索 db1 中的每个值?

4

2 回答 2

4
library(data.table)
db1 <- data.table(db1)
db2 <- data.table(db2)
# You can index by additional columns as necessary
setkeyv(db1, c("id","class"))
setkeyv(db2, c("id","class"))

# Show only records in db2 that match id and class with db1

db2[db1,]

      id class           x         var
 [1,]  1     1 -0.50266835  0.82391749
 [2,]  1     9 -1.21245991 -1.43163848
 [3,]  1     9 -1.21245991 -0.68622189
 [4,]  1    10 -0.28659235 -0.98107793
 [5,]  2     4  2.18779836  1.25841256
 [6,]  2     6  1.32407301  0.42287395
 [7,]  2     7 -0.53808409 -0.12069089
 [8,]  2    10 -0.67679146 -0.73930821
 [9,]  3     7  0.03133591  0.31142901
[10,]  3     8  0.78927215  1.86952233
[11,]  3     9 -0.04674115 -0.45102021
[12,]  3    10 -0.83388764 -0.04354332
[13,]  4     8  1.17608109 -0.07343352
[14,]  4     8  1.17608109 -0.00053299
[15,]  4     9  0.59344187 -0.21407897
[16,]  4    10 -2.06237055  0.78420146

# To just return an index of matching rows
db2[db1, which=T]

 [1]  1  9  9 10 14 16 17 20 27 28 29 30 38 38 39 40

# To get only unique row indices
> db2[unique(db1),which=T]
[1]  1  9 10 14 16 17 20 27 28 29 30 38 39 40
于 2012-06-05T14:39:41.920 回答
0

如果 db1 和 db2 具有相同的行数,则打印 db2 和 db1 'id, class' 相等的所有 db2 行:

print(db2[db2$id == db1$id & db2$class == db1$class,])

按 db2$id 排序的相同查询:

print(db2[order(db2[db2$id == db1$id & db2$class == db1$class,]$id, decreasing = TRUE))
于 2012-06-05T11:30:02.763 回答