8

The following code segfaults my R 2.15.0, running data.table 1.8.9.

library(data.table)
d = data.table(date = c(1,2,3,4,5), value = c(1,2,3,4,5))

# works as expected
d[-5][, mean(value), by = list(I(as.integer((date+1)/2)))]

# crashes R
d[-5, mean(value), by = list(I(as.integer((date+1)/2)))]

And on a related note, the following two commands have very different outputs:

d[-5][, value, by = list(I(as.integer((date+1)/2)))]
#    I value
# 1: 1     1
# 2: 1     2
# 3: 2     3
# 4: 2     4

d[-5, value, by = list(I(as.integer((date+1)/2)))]
#    I         value
# 1: 1 2.121996e-314
# 2: 1 2.470328e-323
# 3: 2 3.920509e-316
# 4: 2 2.470328e-323

Simpler command crashing my R from the comments:

d[-5, value, by = date]

As Ricardo points out, it's the combination of negative indexing and by that creates the problem.

4

2 回答 2

4

One hypothesis is that the problem is related to the following lines in [.data.table:

o__ = if (length(o__)) irows[o__]
              else irows

o__ eventually gets passed to the C code (dogroups.C) as -5 in this case. One could imagine this causing issues with pointer arithmetic leading to segfaults and/or erroneous values.

A potential workaround would be to use data.table's not-join syntax:

d[!5, mean(value), by = list(I(as.integer((date+1)/2)))]

which passes through some different logic on the way to C:

if (notjoin) {
            ... Omitted for brevity ...
            i = irows = if (length(irows)) seq_len(nrow(x))[-irows] else NULL
        }
于 2013-04-17T19:33:37.527 回答
4

UPDATE: This has been fixed in v1.8.11. From NEWS :

Crash or incorrect aggregate results with negative indexing in i is fixed, #2697. Thanks to Eduard Antonyan (eddi) for reporting. Tests added.

于 2013-09-08T12:24:49.703 回答