0

我有一个巨大的数据文件(~2 G),需要分成奇数行和偶数行,分别处理并写入两个文件,我不想将整个文件读入 RAM,所以我认为应该是生成器一个合适的选择。简而言之,我想做这样的事情:

lines = (l.strip() for l in open(inputfn))
oddlines = somefunction(getodds(lines))
evenlines = somefunction(getevens(lines))
outodds.write(oddlines)
outevens.write(evenlines)

这可能吗?显然索引不起作用:

In [75]: lines[::2]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/kaiyin/Phased/build37/chr22/segments/segment_1/<ipython-input-75-97be680d00e3> in <module>()
----> 1 lines[::2]

TypeError: 'generator' object is not subscriptable
4

3 回答 3

2
def oddlines(fileobj):
    return (line for index,line in enumerate(fileobj) if index % 2)

def evenlines(fileobj):
    return (line for index,line in enumerate(fileobj) if not index % 2)

请注意,这将需要扫描文件两次,因为这些文件并非旨在并行运行。但是,它确实会导致代码不那么复杂。(另请注意,此处的“奇数”行是索引为 1,3,5 的行 - 这意味着由于零索引,第一行是“偶数”行。)

正如 Ashwini 所说,您也可以使用它itertools.islice来执行此操作。

于 2013-08-04T20:45:18.017 回答
2

用于 itertools.islice对迭代器进行切片:

from itertools import islice
with open('filename') as f1, open('evens.txt', 'w') as f2:
    for line in islice(f1, 0, None, 2):
        f2.write(line)

with open('filename') as f1, open('odds.txt', 'w') as f2:
    for line in islice(f1, 1, None, 2):
        f2.write(line)
于 2013-08-04T20:45:19.600 回答
0

如果您只想读取文件一次,请编写一个包含 afile并返回一个标志的生成器,该标志指示该行是偶数还是奇数以及从文件中读取的实际行。

def oddeven(f, even=True):
    for line in f:
        yield even, line
        even = not even

用法:

with open("infile.txt") as infile, \
     open("odd.txt", "w") as oddfile, \
     open ("even.txt", "w") as evenfile:
         for even, line in oddeven(infile):
            if even:
                evenfile.write(line)
            else:
                oddfile.write(line)

这可以通过将输出文件对象存储在可索引容器中来进一步简化:

with open("infile.txt") as infile, \
     open("odd.txt", "w") as oddfile, \
     open ("even.txt", "w") as evenfile:
         outfiles = (oddfile, evenfile)
         for even, line in oddeven(infile):
             outfiles[even].write(line)
于 2013-08-04T21:35:27.390 回答