如果您需要提前查看(一次读取几个额外的字符),您需要一个缓冲文件对象。下面的类就是这样做的:
import io
class AlphaPeekReader(io.BufferedReader):
def readalpha(self, count):
"Read one character, and peek ahead (count - 1) *extra* characters"
val = [self.read1(1)]
# Find first alpha character
while not val[0].isalpha():
if val == ['']:
return '' # EOF
val = [self.read1(1)]
require = count - len(val)
peek = self.peek(require * 3) # Account for a lot of garbage
if peek == '': # EOF
return val[0]
for c in peek:
if c.isalpha():
require -= 1
val.append(c)
if not require:
break
# There is a chance here that there were not 'require' alpha chars in peek
# Return anyway.
return ''.join(val)
这会尝试在您正在阅读的一个字符之外找到额外的字符,但不能保证它能够满足您的要求。如果我们在文件末尾或者下一个块中有很多非字母文本,它可能会读取更少。
用法:
with AlphaPeekReader(io.open(filename, 'rb')) as alphafile:
alphafile.readalpha(3)
演示,使用带有示例输入的文件:
>>> f = io.open('/tmp/test.txt', 'rb')
>>> alphafile = AlphaPeekReader(f)
>>> alphafile.readalpha(3)
'abc'
>>> alphafile.readalpha(3)
'bcc'
>>> alphafile.readalpha(3)
'ccd'
>>> alphafile.readalpha(10)
'cdfdenhjcd'
>>> alphafile.readalpha(10)
'dfdenhjcde'
要readalpha()
在循环中使用调用,您可以分别获取每个字符以及接下来的两个 2 字节,请使用iter()
带有标记的 :
for alpha_with_extra in iter(lambda: alphafile.readalpha(3), ''):
# Do something with alpha_with_extra