r - 使用 by-operator 拆分 data.table：返回数值和/或 NA 的函数失败

Question

我有data.table两列：一ID列和一value列。我想按列拆分表并在列上ID运行一个函数。只要不返回 NA，它就可以正常工作。在这种情况下，我收到一个错误，告诉我组的类型不一致。我的假设是 - 因为equals和equals ，内部假设我想将逻辑值与数字值组合并返回错误。但是，我发现这种行为很奇怪。对此有何评论？我是否错过了一些明显的东西，或者这确实是预期的行为？如果是这样，简短的解释会很棒。（请注意，我确实知道一种解决方法：让foovaluefoois.logical(NA)TRUEis.numeric(NA)FALSEdata.tablefoo2返回一个完整的不可能的数字并稍后过滤。但是，这似乎是糟糕的编码）。

这是示例：

library(data.table)
foo1 <- function(x) {if (mean(x) < 5) {return(1)} else {return(2)}}
foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT <- data.table(ID=rep(c("A", "B"), each=5), value=1:10)
DT[, foo1(value), by=ID] #Works perfectly
     ID V1
[1,]  A  1
[2,]  B  2
DT[, foo2(value), by=ID] #Throws error
Error in `[.data.table`(DT, , foo2(value), by = ID) : 
columns of j don't evaluate to consistent types for each group: result for group 2 has column 1 type 'logical' but expecting type 'numeric'

score 11 · Accepted Answer

你可以通过指定你的函数应该返回一个NA_real_，而不是NA默认类型来解决这个问题。

foo2 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA)}}
DT[, foo2(value), by=ID] #Throws error
# Error in `[.data.table`(DT, , foo2(value), by = ID) : 
# columns of j don't evaluate to consistent types for each group: 
# result for group 2 has column 1 type 'logical' but expecting type 'numeric'

foo3 <- function(x) {if (mean(x) < 5) {return(1)} else {return(NA_real_)}}
DT[, foo3(value), by=ID] #Works
#      ID V1
# [1,]  A  1
# [2,]  B NA

foo2()顺便说一句，失败时给出的消息非常有用。它基本上告诉您您的 NA 类型错误。要解决此问题，您只需要查找NA正确类型（或类）的常量：

NAs <- list(NA, NA_integer_, NA_real_, NA_character_, NA_complex_)
data.frame(contantName = sapply(NAs, deparse), 
           class       = sapply(NAs, class),
           type        = sapply(NAs, typeof))

#     contantName     class      type
# 1            NA   logical   logical
# 2   NA_integer_   integer   integer
# 3      NA_real_   numeric    double
# 4 NA_character_ character character
# 5   NA_complex_   complex   complex

r - 使用 by-operator 拆分 data.table：返回数值和/或 NA 的函数失败

1 回答 1

Related

Reference