我正在尝试在具有 200 万行和 8 个变量的表上使用 data.table 进行非 equi 自连接。数据如下所示:
db table :
product position_min position_max count_pos
A.16 167804 167870 20
A.18 167804 167838 15
A.15 167896 167768 18
A.20 238359 238361 33
A.35 167835 167837 8
dt table:
product_t position_min_t position_max_t count_pos_t
A.16 167804 167870 20
A.18 167804 167838 15
A.15 167896 167768 18
A.20 238359 238361 33
A.35 167835 167837 8
这是我使用的代码:
db_join <- db[dt, .(product, product_t, position_min_t, position_max_t, count_pos_t), on = .(position_min <= position_min_t, position_max >= position_max_t)]
我应该得到:
A16 A18 167804 167838 15
A16 A15 167896 167768 18
A16 A35 167835 167837 8
A18 A35 167835 167837 8
但我不断收到此错误
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
Join results in more than 2^31 rows (internal vecseq reached physical limit). Very likely misspecified join. Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
我确实将 设置allow.cartesian为 TRUE 并添加了by=.EACHI,它仍然不起作用。我在包含 160 万行的表的一个子集上尝试了相同的代码,它就像一个魅力。你知道如何解决它吗?任何帮助将非常感激