我有一个巨大的文本文件(1 GB),其中每个“行”都符合语法:
[number] [number]_[number]
例如:
123 123_1234
45 456_45 12 12_12
我收到以下错误:
line 46, in open_delimited
pieces = re.findall(r"(\d+)\s+(\d+_\d+)", remainder + chunk, re.IGNORECASE)
TypeError: can only concatenate tuple (not "str") to tuple
在这段代码上:
def open_delimited(filename, args):
with open(filename, args, encoding="UTF-16") as infile:
chunksize = 10000
remainder = ''
for chunk in iter(lambda: infile.read(chunksize), ''):
pieces = re.findall(r"(\d+)\s+(\d+_\d+)", remainder + chunk, re.IGNORECASE)
for piece in pieces[:-1]:
yield piece
remainder = pieces[-1]
if remainder:
yield remainder
filename = 'data/AllData_2000001_3000000.txt'
for chunk in open_delimited(filename, 'r'):
print(chunk)