python - 使用 Python 在一个文件中使用多个 re.sub() 调用

Question

我有一个文件，其中包含大量随机字符串。有些模式我不想删除，所以我决定使用 RegEX 来检查它们。到目前为止，这段代码完全符合我的要求：

#!/usr/bin/python

import csv
import re
import sys
import pdb


f=open('output.csv', 'w')

with open('retweet.csv', 'rb') as inputfile:
    read=csv.reader(inputfile, delimiter=',')
    for row in read:
        f.write(re.sub(r'@\s\w+', ' ', row[0]))
        f.write("\n")
f.close()

f=open('output2.csv', 'w')

with open('output.csv', 'rb') as inputfile2:
    read2=csv.reader(inputfile2, delimiter='\n')
    for row in read2:
        a= re.sub('[^a-zA-Z0-9]', ' ', row[0])
        b= str.split(a)
        c= "+".join(b)
        f.write("http://www.google.com/webhp#q="+c+"&btnI\n")
f.close()

问题是，我想避免打开和关闭文件，因为如果我需要检查更多模式，这可能会变得混乱。如何在同一个文件上执行多个 re.sub() 调用并将其写出到具有所有替换的新文件中？

谢谢你的帮助！

score 3 · Accepted Answer

在当前行上一次性应用所有替换：

with open('retweet.csv', 'rb') as inputfile:
    read=csv.reader(inputfile, delimiter=',')
    for row in read:
        text = row[0]
        text = re.sub(r'@\s\w+', ' ', text)
        text = re.sub(another_expression, another_replacement, text)
        # etc.
        f.write(text + '\n')

请注意，打开一个文件csv.reader(..., delimiter='\n')听起来非常像您将该文件视为一系列行。你可以遍历文件：

with open('output.csv', 'rb') as inputfile2:
    for line in inputfile2:

python - 使用 Python 在一个文件中使用多个 re.sub() 调用

1 回答 1

Related

Reference