python - 在 Python 中逐步遍历 CSV 文件

Question

我正在尝试加快将大型 CSV 文件加载到 MySQL 数据库中的速度。使用此代码加载一个 4GB 的文件大约需要 4 个小时：

with open(source) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    next(csv_reader)
    insert_sql = """ INSERT INTO billing_info_test (InvoiceId, PayerAccountId, LinkedAccountId) VALUES (%s, %s, %s) """
    for row in csv_reader:
        cursor.execute(insert_sql,row)
        print(cursor.rowcount, 'inserted with LinkedAccountId', row[2], 'at', datetime.now().isoformat())
    print("Committing the DB")
    mydb.commit(
cursor.close()
mydb.close()

我想使用executemany()语句来加快速度。为此，您必须将元组列表传递给第二个参数。

如果我在每次行迭代时构建列表，它会变得太大，并且当列表变得太大时，我会出现内存不足错误，并且脚本会崩溃。

我无法获得 csv_reader 或 csv_file 的长度以在范围语句中使用。

如何一次循环遍历 CSV 文件 1000 行并将结果存储在列表中，在 executemany 中使用它，然后存储接下来的 1000 行等，直到 CSV 文件结束？

score 1 · Accepted Answer

如果需要在mysql中进行高速插入，可以尝试使用：

LOAD DATA LOCAL INFILE '/path/to/my_file.csv' INTO TABLE my_table;

score 0 · Accepted Answer

一个小提示：

In [1]: import itertools

In [2]: rows = iter(range(10))

In [3]: while True:
   ...:     batch = [*itertools.islice(rows, 3)]
   ...:     if not batch:
   ...:         break
   ...:     print(batch)
   ...:
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

但我应该同意@heliosk更好的解决方案是LOAD DATA INFILE用于大文件。在导入完成之前，您可能还需要禁用密钥。

python - 在 Python 中逐步遍历 CSV 文件

2 回答 2

Related

Reference