python - psycopg2 COPY 使用 cursor.copy_from() 冻结大输入

Question

考虑以下 Python 中的代码，使用 psycopg2cursor对象（为清楚起见，更改或省略了一些列名）：

filename='data.csv'
file_columns=('id', 'node_id', 'segment_id', 'elevated', 
              'approximation', 'the_geom', 'azimuth')
self._cur.copy_from(file=open(filename),
                    table=self.new_table_name, columns=file_columns)

该数据库位于快速 LAN 上的远程计算机上。
使用\COPYfrom bash 的速度非常快，即使对于大型（约 1,000,000 行）文件也是如此。

这段代码对于 5,000 行来说是超快的，但是当data.csv超过 10,000 行时，程序会完全冻结。

任何想法\解决方案？

亚当

score 5 · Accepted Answer

这只是一种解决方法，但是您可以将某些内容通过管道传输到 psql 中。有时当我懒得淘汰 psycopg2 时，我会使用这个食谱

import subprocess
def psql_copy_from(filename, tablename, columns = None):
    """Warning, this does not properly quote things"""
    coltxt = ' (%s)' % ', '.join(columns) if columns else ''
    with open(filename) as f:
        subprocess.check_call([
            'psql',
            '-c', 'COPY %s%s FROM STDIN' % (tablename, coltxt),
            '--set=ON_ERROR_STOP=true', # to be safe
            # add your connection args here
        ], stdin=f)

就您的锁定而言，您是否使用多个线程或类似的东西？

您的 postgres 是否记录了诸如关闭连接或死锁之类的内容？锁定后你能看到磁盘活动吗？

score 1 · Accepted Answer

这是内存限制，这使得“copy_from”在 open(filename) 一次性返回所有文件时崩溃。这是 psycopg2 的问题，而不是 Postgresql 的问题，所以 Mike 的解决方案是最好的。

如果您想将“copy_from”与常规提交一起使用并同时管理重复键，则有一个解决方案： https ://stackoverflow.com/a/11059350/1431079

python - psycopg2 COPY 使用 cursor.copy_from() 冻结大输入

2 回答 2

Related

Reference