如果要删除 extea 行:
为此,如果该行后面没有空的新行,或者该行之前应该有与以下 regex 匹配的行,您可以检查每个类似的 2 个条件^\d{2}:\d{2},\d{3}\s$
。
因此,为了在每次迭代中访问下一行,您可以从主文件对象创建一个文件对象,其名称为temp
usingitertools.tee
并在其上应用该next
函数。并用于re.match
匹配正则表达式。
from itertools import tee
import re
with open('ex.txt') as f,open('new.txt','w') as out:
temp,f=tee(f)
next(temp)
try:
for line in f:
if next(temp) !='\n' or re.match(r'^\d{2}:\d{2},\d{3}\s$',pre):
out.write(line)
pre=line
except :
pass
结果 :
1
17:02,111
Problem report related to
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
如果要将其余部分连接到第三行:
如果您想将第三行之后的其余行连接到第三行,您可以使用以下正则表达式来查找文件 ( )后面\n\n
或结尾的所有块:$
r"(.*?)(?=\n\n|$)"
然后根据日期格式的行拆分块并将部分写入输出文件,但请注意,您需要用空格替换第 3 部分中的新行:
ex.txt:
1
17:02,111
Problem report related to
router
another line
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk
now due to compromised data
line 5
line 6
line 7
演示:
def splitter(s):
for x in re.finditer(r"(.*?)(?=\n\n|$)", s,re.DOTALL):
g=x.group(0)
if g:
yield g
import re
with open('ex.txt') as f,open('new.txt','w') as out:
for block in splitter(f.read()):
first,second,third= re.split(r'(\d{2}:\d{2},\d{3}\n)',block)
out.write(first+second+third.replace('\n',' '))
结果 :
1
17:02,111
Problem report related to router another line
2
17:05,223
Restarting the systems
3
18:02,444
Must erase hard disk now due to compromised data line 5 line 6 line 7
注意:
在这个答案中,该splitter
函数返回一个生成器,当您处理大文件并拒绝在内存中存储不可用的行时,该生成器非常有效。