0

我正在尝试在具有 200 万行和 8 个变量的表上使用 data.table 进行非 equi 自连接。数据如下所示:

db table :
product     position_min     position_max      count_pos
A.16        167804              167870              20
A.18        167804              167838              15
A.15        167896              167768              18
A.20        238359              238361              33
A.35        167835              167837              8

dt table:
product_t   position_min_t     position_max_t      count_pos_t
A.16        167804              167870              20
A.18        167804              167838              15
A.15        167896              167768              18
A.20        238359              238361              33
A.35        167835              167837              8

这是我使用的代码:

db_join <- db[dt, .(product, product_t, position_min_t, position_max_t, count_pos_t), on = .(position_min <= position_min_t, position_max >=  position_max_t)]

我应该得到:

A16        A18          167804              167838              15
A16        A15          167896              167768              18
A16        A35          167835              167837              8
A18        A35          167835              167837              8

但我不断收到此错误

Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
  Join results in more than 2^31 rows (internal vecseq reached physical limit). Very likely misspecified join. Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.

我确实将 设置allow.cartesian为 TRUE 并添加了by=.EACHI,它仍然不起作用。我在包含 160 万行的表的一个子集上尝试了相同的代码,它就像一个魅力。你知道如何解决它吗?任何帮助将非常感激

4

0 回答 0