r - 根据其他列的条件将数值替换为 NA：

Question

我是 data.table 包的新手，请执行我的简单问题。我有一个看起来像 DT 的数据集

DT <- data.table(a = sample(c("C","M","Y","K"),  100, rep=TRUE),
                   b = sample(c("A","S"),  100, rep=TRUE),
                   f = round(rnorm(n=100, mean=.90, sd=.08),digits = 2) ); DT

如果满足特定条件，我想用 NA 替换 f 列中的任何值。例如，0.85 > f > 0.90我将有以下条件：

DT$a == "C" & DT$b == "S" & DT$f < .85| DT$a == "C" & DT$b == "S" & DT$f >.90

我还想为 a 和 b 列中的每个分类变量设置不同的条件。

score 3 · Accepted Answer

使用您声明的条件，但没有DT$将您的子集data.table用于满足条件的条目，然后您可以使用该字段通过使用运算符的引用j分配 NA 值。那是，f:=

DT[a == "C" & b == "S" & f < .85 | a == "C" & b == "S" & f >.90, f := NA]
which(is.na(DT$f))
# [1]  3 16 31 89

编辑：在 OP 的评论和@Joshua 的好建议之后：

`%between%` <- function(x, vals) { x >= vals[1] & x <= vals[2]}
`%nbetween%` <- Negate(`%between%`)
DT[a %in% c("C", "M", "Y", "K") & b == "S" & f %nbetween% c(0.85, 0.90), f := NA]

%nbetween%这是对的否定%between%将给出所需的结果（f < 0.85 和 f > 0.90）。还要注意使用%in%来检查多个值a

编辑 2：在 OP 完全重写之后，恐怕您无能为力，除了 group b == "A", b == "S"。

`%nbetween%` <- Negate(`%between%`)
DT[a == "M" & b %in% c("A", "S") & f %nbetween% c(.85, .90), f := NA]
DT[a == "Y" & b %in% c("A", "S") & f %nbetween% c(.95, .90), f := NA]
DT[a == "K" & b %in% c("A", "S") & f %nbetween% c(.95, 1.10), f := NA]

r - 根据其他列的条件将数值替换为 NA：

1 回答 1

Related

Reference