我想从具有特定单词的行中读取一个非常大的文件,最好的方法是什么?
假设它是一个有 50K 行的文件
43511
24622
53213
43534
57656
12121
我想从具有 43534 的行开始读取该文件的行,对于大文件,最有效的方法是什么?
你可以使用itertools.dropwhile
t = '''43511
24622
53213
43534
57656
12121
'''
from StringIO import StringIO
import os
from itertools import dropwhile
from contextlib import closing
with closing(StringIO(t)) as f:
for x in dropwhile(lambda x: x != '43534' + os.linesep, f):
print x
One way to do it manually without heavily exploding the memory could be something like this:
f = open('file.txt','r')
found = False
for line in f
if line == '43534':
found = True
if found:
# you now reached the line in the file and
# therefore you can begin process it here
# in case you need the position of the buffer
# you do: f.tell()
Hope this helps!
Just create a binary variable to represent whether or not you've read in that particular target string you are looking for. When you reach the string, flip the flag, triggering your script to read the rest of the file.
test = '43534'
past_test = False
with open(fname,'r') as f:
for line in f:
if past_test:
# do stuff
elif line == test:
past_test = True