c# - PostgreSQL：在具有外键的多表中插入大量数据

Question

因此，我正在开展一个项目，该项目涉及在一天内将大量数据插入三个主表中。这三个表都相互链接。

以下是表格：

event
    user_id
    event_time
    event_id (PRIMARY) (Serial Int)

subevent
    subevent_type
    subevent_value
    subevent_id (PRIMARY) (Serial Int)

event_relationship
    event_id (1)
    subevent_id (MANY)

events随时可能发生，当它们发生时，我需要记录详细信息subevents并将它们插入数据库。一个可以有 5 到 500subevents个event。我有一个关系表而不仅仅是一个外键列的subevents原因是因为还有其他进程添加了subevents没有 parent的值events。令人困惑，也许。

到一天结束时，我可能已经插入了多达 1000 万subevents和 250,000个events。所以速度对我来说是最重要的事情之一。我发现将它们全部插入在一起的最佳方法之一是使用DO $$ DECLARE ... END$$;命令。我可以声明临时整数值并捕获我插入的和的 id，events然后subevents将它们一起插入到event_relationship表中。

这是我当前正在运行的代码，它作为 PL/pgSql 执行

DO $$ DECLARE _new_event_id INTEGER; _new_subevent_id INTEGER;
BEGIN
    INSERT INTO event (user_id, event_time) VALUES (@user_id, @event_time)
    RETURNING event_id INTO _new_event_id;

    INSERT INTO subevent (subevent_type, subevent_value)
    VALUES (@subevent_type, @subevent_value)
    RETURNING subevent_id INTO _new_subevent_id;

    INSERT INTO event_relationship VALUES (_new_event_id, _new_subevent_id);

END$$;

（第一次插入只有一次，最后两次插入对每个子事件重复。我使用 C# 和 NpgSql 执行命令，并且可以在进程运行时动态构建命令。）

然而，在一天的过程中，这陷入了困境，我的数据开始备份到我无法足够快地插入它的地步。我只是想知道我是否在这里采取了错误的方法，或者是否有另一种方法可以做我已经在做的事情，但以更快的方式。

score 1 · Accepted Answer

您可以有一个外键关系并null在引用表中插入：

create table t (i int primary key);
create table t2 (i int references t (i));

insert into t2 (i) values (null);
INSERT 0 1

insert into t2 (i) values (1);
ERROR:  insert or update on table "t2" violates foreign key constraint "t2_i_fkey"
DETAIL:  Key (i)=(1) is not present in table "t".

或者在引用表中有一个特殊值，如零或 -1，用于“孤立”子事件。

要减轻负载，您可以在单个命令中插入子事件：

insert into subevent (subevent_type, subevent_value) values
(@subevent1_type, @subevent1_value),
(@subevent2_type, @subevent2_value);

score 0 · Accepted Answer

由于您使用的是 NpgSql，我假设您是 .net 开发人员。

如果瓶颈在于创建 sql 命令，这里有一篇文章，其中包含一些关于提高插入性能的提示：http: //visualstudiomagazine.com/articles/2006/03/01/5-surefire-adonet-performance-tips .aspx。其中提到的技术是 DbCommand.Prepare() 方法，我认为它类似于 wildplasser 提到的“准备好的语句”。

如果瓶颈是在实际插入期间，请考虑使用与数据库服务器的多个连接并在多个线程中完成工作。

c# - PostgreSQL：在具有外键的多表中插入大量数据

2 回答 2

Related

Reference