0

我想从具有特定单词的行中读取一个非常大的文件,最好的方法是什么?

假设它是一个有 50K 行的文件

43511
24622
53213
43534
57656
12121

我想从具有 43534 的行开始读取该文件的行,对于大文件,最有效的方法是什么?

4

3 回答 3

3

你可以使用itertools.dropwhile

t = '''43511
24622
53213
43534
57656
12121
'''


from StringIO import StringIO
import os
from itertools import dropwhile
from contextlib import closing

with closing(StringIO(t)) as f:
    for x in dropwhile(lambda x: x != '43534' + os.linesep, f):
            print x
于 2013-07-12T16:45:13.833 回答
1

One way to do it manually without heavily exploding the memory could be something like this:

f = open('file.txt','r')
found = False
for line in f
    if line == '43534':
        found = True
    if found:
        # you now reached the line in the file and
        # therefore you can begin process it here
        # in case you need the position of the buffer
        # you do: f.tell()

Hope this helps!

于 2013-07-12T16:38:54.667 回答
1

Just create a binary variable to represent whether or not you've read in that particular target string you are looking for. When you reach the string, flip the flag, triggering your script to read the rest of the file.

test = '43534'
past_test = False
with open(fname,'r') as f:
    for line in f:
        if past_test:
            # do stuff                
        elif line == test:
            past_test = True
于 2013-07-12T16:39:10.947 回答