0

嗨,我想生成从 1000000 到 2000000 的数字列表,但问题是我得到一个错误内存错误我使用的是随机的一切都很好,只有我得到重复的数字,我不能有重复的数字,所以我切换到 xrange

data = []
total = 2000000
def resource_file(info):
    with open(info, "r") as data_file:
        reader = csv_reader(data_file, delimiter=",")
        for row in reader:
            try:
                for i in xrange(1000000,total):
                    new_row = [row[0], row[1], i]
                    data.append(new_row)
            except IndexError as error:
                print(error)
    with open(work_dir + "new_data.csv", "w") as new_data:
        writer = csv_writer(new_data, delimiter=",")
        for new_row in data:
            writer.writerow(new_row)
4

1 回答 1

3

用额外的列重复每一行,范围为 1M..2M

问题是您首先将所有这些配置存储在内存中。首先 Python 没有一个非常有效的内存模型,而且每行一百万个条目无论如何都是相当大的。

我建议不要将数据存储在列表中,而是立即将它们写入文件:

total = 2000000
def resource_file(info):
    with open(info, "r") as data_file:
        reader = csv_reader(data_file, delimiter=",")
        with open(work_dir + "new_data.csv", "w") as new_data:
            writer = csv_writer(new_data, delimiter=",")
            for row in reader:
                rowa, rowb = row[0:2]
                for data in xrange(1000000,total):
                    writer.writerow([rowa,rowb,data])

取文件的 1M-2M 行

如果你想取原始文件的 1M 到 2M 行,你可以写成:

from itertools import islice

total = 2000000
def resource_file(info):
    with open(info, "r") as data_file:
        reader = csv_reader(data_file, delimiter=",")
        with open(work_dir + "new_data.csv", "w") as new_data:
            writer = csv_writer(new_data, delimiter=",")
            for row in islice(reader,1000000,total):
                writer.writerow(row)

或者您可以像@JonClemens 所说的那样简化它:

from itertools import islice

total = 2000000
def resource_file(info):
    with open(info, "r") as data_file:
        reader = csv_reader(data_file, delimiter=",")
        with open(work_dir + "new_data.csv", "w") as new_data:
            writer = csv_writer(new_data, delimiter=",")
            writer.writerows(islice(reader,1000000,total))
于 2017-10-29T08:40:06.923 回答