3

我很难尝试data.table在 R 中对(包)进行子集化。给出以下示例

library(data.table)

x = c(rep("a", 6), rep("b", 5))
y = c(0,2,1,0,1,2, 0,1,0,2,1)
z = c(1:6,1:5) + rnorm(11, 0.02, 0.1)

DT = data.table(ind = x, cond = y, dist = z)

      ind cond     dist
 [1,]   a    0 1.078966
 [2,]   a    2 1.987159
 [3,]   a    1 3.143391
 [4,]   a    0 3.937058
 [5,]   a    1 5.037681
 [6,]   a    2 6.036432
 [7,]   b    0 1.057809
 [8,]   b    1 2.144755
 [9,]   b    0 3.010903
[10,]   b    2 3.937765
[11,]   b    1 4.976273

我想对列中第一个之后的所有内容进行子集1cond。换句话说,所有大于3.143391fora2.144755for 的东西b(在这个例子中)。

DT.sub <- DT[cond == "1",] # Please, combine this row
DT.sub[,.SD[dist==min(dist)],by=ind] # With this to make the code shorter, if you can.

  ind cond     dist
[1,]   a    1 3.143391
[2,]   b    1 2.144755

结果应如下所示:

      ind cond     dist
 [1,]   a    0 3.937058
 [2,]   a    1 5.037681
 [3,]   a    2 6.036432
 [4,]   b    0 3.010903
 [5,]   b    2 3.937765
 [6,]   b    1 4.976273
4

1 回答 1

3

怎么样 :

DT[,.SD[seq(match(1,cond)+1,.N)],by=ind]
     ind cond     dist 
[1,]   a    0 3.937058 
[2,]   a    1 5.037681 
[3,]   a    2 6.036432 
[4,]   b    0 3.010903 
[5,]   b    2 3.937765 
[6,]   b    1 4.976273 

顺便说一句,最好set.seed(1)先这样我们可以使用相同的随机数据。

于 2012-06-27T12:55:14.320 回答