r - 使用带有“all()”的 R data.table 来对数据进行子集化

Question

我有以下数据表：

  dt=structure(list(a = c("10", "10", "20", "30", "10", "25", "10"
    ), b = c("0.605887455840394", "0", "0.709466017509524", "0", 
    "0.585528817843856", "-0.109303314681054", "-0.453497173462763"
    ), c = c("-0.919322002474128", "0", "0.630098551068391", "0", 
    "-1.81795596770373", "-0.276184105225216", "-0.284159743943371"
    ), d = c("-0.750531994502331", "0", "1.81731204370422", "0", 
    "-0.116247806352002", "0.370627864257954", "0.520216457554957"
    ), e = c("0.298723699267293", "0", "-0.886357521243213", "0", 
    "0.816899839520583", "-0.331577589942552", "1.12071265166956"
    ), key = c("A", "A", "B", "B", "C", "C", "C")), .Names = c("a", 
    "b", "c", "d", "e", "key"), row.names = c(NA, -7L), class = c("data.table", 
        "data.frame"), sorted = "key")

这给了我一个类似于下面显示的数据表。

    a                  b                  c                  d                  e key
1: 10  0.605887455840394 -0.919322002474128 -0.750531994502331  0.298723699267293   A
2: 10                  0                  0                  0                  0   A
3: 20  0.709466017509524  0.630098551068391   1.81731204370422 -0.886357521243213   B
4: 30                  0                  0                  0                  0   B
5: 10  0.585528817843856  -1.81795596770373 -0.116247806352002  0.816899839520583   C
6: 25 -0.109303314681054 -0.276184105225216  0.370627864257954 -0.331577589942552   C
7: 10 -0.453497173462763 -0.284159743943371  0.520216457554957   1.12071265166956   C

我想做一个子集操作，删除全为零的行。

我在想一些类似的东西

dt[!(all(i[2:4) == 0)]但我不确定如何在 data.table 中实际说明这一点

对此有任何帮助将不胜感激。

score 3 · Accepted Answer

This seems the perfect opportunity to use a not-join. This will require setting the key to be the columns you wish to subset on

keys <- names(dt)[2:5]
setkeyv(dt, keys)

 dt[!as.list(rep("0", length(keys)))]

Note that currently you key columns are character, which will be more efficient than if they were numeric.

score 2 · Accepted Answer

1）第一行创建一个逻辑向量，选择适当的行，第二行选择它们：

ok <- dt[, ! apply(.SD == 0, 1, all), .SDcols = 2:5]
dt[ok]

2）我们也可以any用节省一个字符加空格的方式来编写它：

ok <- dt[, apply(.SD != 0, 1, any), .SDcols = 2:5]
dt[ok]

3) For a small number of columns this is even shorter:

dt[ apply(cbind(b, c, d, e) != 0, 1, any) ]

4) and also for a small number of columns this one is shorter still and simpler

dt[ b != 0 | c != 0 | d != 0 | e != 0 ]

score 1 · Accepted Answer

这是一个两步解决方案：

dt[
    !dt[,
        .I[all(sapply(.SD,function(x)x=="0"))]
    ,by=1:nrow(dt),.SDcols=letters[2:5]]$V1
]

屈服

    a                  b                  c                  d                  e key
1: 10  0.605887455840394 -0.919322002474128 -0.750531994502331  0.298723699267293   A
2: 20  0.709466017509524  0.630098551068391   1.81731204370422 -0.886357521243213   B
3: 10  0.585528817843856  -1.81795596770373 -0.116247806352002  0.816899839520583   C
4: 25 -0.109303314681054 -0.276184105225216  0.370627864257954 -0.331577589942552   C
5: 10 -0.453497173462763 -0.284159743943371  0.520216457554957   1.12071265166956   C

内部选择满足条件的行索引“.I”。通过使用非“！”排除这些行，外部括号子集“dt” 操作员。

r - 使用带有“all()”的 R data.table 来对数据进行子集化

3 回答 3

Related

Reference