sql - 在 Postgres 中有效地插入表中

Question

在我的数据库中，我将一个树状数据结构保存为一个表tab，其中包含列id（主键）value、from_id和depth，其中depth（整数）表示距树根的距离。

现在我想tab从另一个表candidates（列id, value, from_id）向表中添加行，但有两个限制：1）只有新的id和 2）只有depth低于某个给定阈值的行（例如 3 或 4）。

from_id在那个点上可能有不止一个tab指向中的新行candidates。

作为 Postgres 初学者，我希望我的方法是正确的，但效率很低：

insert into tab
select distinct c.id, c.value, c.from_id, t.depth+1 as depth
from candidates as c
join tab as t on t.id=c.from_id
where depth<3 and c.id not in
(select id from tab);

我正在寻找加快速度的建议。与一个事务中的其他两个操作一起，对于少于 10k 行，这需要几分钟。

我正在R使用该RPostgres软件包，但我相信这更像是一个 SQL / 数据库问题。

score 1 · Accepted Answer

如果离开tab并检查它id是否NULL会给您带来好处，您可以尝试。

INSERT INTO tab
            (id,
             value,
             from_id,
             depth)
SELECT c1.id,
       c1.value,
       c1.from_id,
       t1.depth + 1
       FROM candidates c1
            INNER JOIN tab t1
                       ON t1.id = c1.from_key
            LEFT JOIN tab t2
                      ON t2.id = c1.id
       WHERE t1.depth + 1 < 3
             AND t2.id IS NULL;

与此同时，尝试将索引放在tab (id, depth)and上candidates (from_key)。

另一种选择是使用NOT EXISTS.

INSERT INTO tab
            (id,
             value,
             from_id,
             depth)
SELECT c1.id,
       c1.value,
       c1.from_id,
       t1.depth + 1
       FROM candidates c1
            INNER JOIN tab t1
                       ON t1.id = c1.from_key
       WHERE t1.depth + 1 < 3
             AND NOT EXISTS (SELECT *
                                    FROM tab t2
                                    WHERE t2.id = c1.id);

无论哪种方式，如果有很多行来提高性能，您可能需要摆脱该IN子句。tab

并且习惯于始终在语句中显式地写下目标列，INSERT否则如果您对目标表进行更改（例如添加列），该语句可能会中断。

sql - 在 Postgres 中有效地插入表中

1 回答 1

Related

Reference