linux - 为什么长时间运行 INSERT … SELECT … 时 PostgreSQL 不保存到磁盘？

Question

我正在使用外部数据包装器将大量数据（在一列上进行非常简单的日期转换）移动到本地数据库。使用 Django 游标（因为我懒得拿出凭据来创建原始 psycopg2 游标）我进行了这种查询（匿名并删除了几个连接，但在其他方面与原始连接相同）：

cursor.executemany(
    sql.SQL(
        """
        INSERT INTO local_table (
            foreign_key_id,
            other_foreign_key_id,
            datetime,
            comment
        )
        SELECT other_local_table.id,
               %s,
               (object_date + to_timestamp(object_time, 'HH24:MI')::time) at time zone '…',
               comment
          FROM imported_schema.remote_table
          JOIN other_local_table ON other_local_table.code = remote_table.code
        """
    ),
    [(dummy_id,)],
)

但是，本地Postgres 服务器总是在一段时间后被 OOM 杀死。我期待 Postgres 将新行刷新到磁盘以避免内存不足，但据我所知，这并没有发生 -/var/lib/docker/volumes/vagrant_postgres_data仅增长几 MB，而常驻内存使用量增长到 GB。本地服务器没有足够的 RAM 将整个结果集保存在内存中，因此我需要一个不涉及更昂贵硬件设置的解决方案。

我需要设置类似的东西wal_sync_method还是work_mem让它工作？

根据文档executemany应该是适合这项工作的工具：

该函数对更新数据库的命令非常有用：查询返回的任何结果集都将被丢弃。

在 Linux 上的服务器上运行Postgres 10.6 容器，并在本地运行 Django 2.1。除了 FDW，我没有使用任何扩展。

解释计划：

Insert on local_table  (cost=817872.44..818779.47 rows=25915 width=56)
  ->  Subquery Scan on "*SELECT*"  (cost=817872.44..818779.47 rows=25915 width=56)
        ->  HashAggregate  (cost=817872.44..818390.74 rows=25915 width=48)
              Group Key: other_local_table.id, 1, timezone('…'::text, (remote_table.object_date + (to_timestamp((remote_table.object_time)::text, 'HH24:MI'::text))::time without time zone)), remote_table.comment
              ->  Nested Loop  (cost=101.15..807974.88 rows=989756 width=48)
                    ->  Nested Loop  (cost=0.57..60.30 rows=73 width=12)
                          ->  Nested Loop  (cost=0.29..42.35 rows=38 width=4)
                                ->  Seq Scan on fourth_local_table  (cost=0.00..7.45 rows=1 width=4)
                                      Filter: ((code)::text = '…'::text)
                                ->  Index Scan using … on third_local_table  (cost=0.29..34.49 rows=41 width=8)
                                      Index Cond: (id = fourth_local_table.id)
                          ->  Index Scan using … on other_local_table  (cost=0.29..0.45 rows=2 width=16)
                                Index Cond: (id = third_local_table.id)
                    ->  Foreign Scan on remote_table  (cost=100.58..9421.44 rows=151030 width=20)

postgresqltuner建议我

在 /etc/sysctl.conf 中设置 vm.overcommit_memory=2 ... 这将禁用内存过度使用并避免 postgresql 被 OOM 杀手杀死。

那是解决方案吗？

score 1 · Accepted Answer

我在您的执行计划中看不到任何内容，除了HashAggregate可能消耗任何内存量的内容，并且应该由work_mem.

要诊断这一点，您应该首先配置您的系统，以便您得到一个常规的 OOM 错误，而不是调用 OOM 杀手。这意味着设置vm.overcommit_memory = 2和sysctl调整vm_overcommit_ratio到100 * (RAM - swap) / RAM。

当服务器收到 OOM 错误时，它会将当前内存上下文及其大小转储到 PostgreSQL 日志中。这应该可以指示内存的去向。如有疑问，请将其添加到问题中。

您是否使用任何第三方扩展程序？

linux - 为什么长时间运行 INSERT … SELECT … 时 PostgreSQL 不保存到磁盘？

1 回答 1

Related

Reference