python - 使用 Python 删除所有匹配正则表达式的行

Question

我正在尝试删除我的正则表达式匹配的所有行（正则表达式只是在寻找任何包含雅虎的行）。每个匹配项都在自己的行上，因此不需要多行选项。

这是我目前所拥有的......

import re
inputfile = open('C:\\temp\\Scripts\\remove.txt','w',encoding="utf8")

inputfile.write(re.sub("\[(.*?)yahoo(.*?)\n","",inputfile))

inputfile.close()

我收到以下错误：

Traceback（最近一次调用最后一次）：第 170 行，在 sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer

score 17 · Accepted Answer

fileinput如果要修改原始文件，请使用模块：

import re
import fileinput
for line in fileinput.input(r'C:\temp\Scripts\remove.txt', inplace = True):
   if not re.search(r'\byahoo\b', line):
      print(line, end="")

score 6 · Accepted Answer

这是@Ashwini Chaudhary 的答案的 Python 3 变体，用于pattern从 give中删除所有包含正则表达式的行filename：

#!/usr/bin/env python3
"""Usage: remove-pattern <pattern> <file>"""
import fileinput
import re
import sys

def main():
    pattern, filename = sys.argv[1:] # get pattern, filename from command-line
    matched = re.compile(pattern).search
    with fileinput.FileInput(filename, inplace=1, backup='.bak') as file:
        for line in file:
            if not matched(line): # save lines that do not match
                print(line, end='') # this goes to filename due to inplace=1

main()

它假设locale.getpreferredencoding(False) == input_file_encoding否则它可能会在非 ascii 字符上中断。

无论当前的语言环境是什么或具有不同编码的输入文件，都要使其工作：

#!/usr/bin/env python3
import os
import re
import sys
from tempfile import NamedTemporaryFile

def main():
    encoding = 'utf-8'
    pattern, filename = sys.argv[1:]
    matched = re.compile(pattern).search
    with open(filename, encoding=encoding) as input_file:
        with NamedTemporaryFile(mode='w', encoding=encoding,
                                dir=os.path.dirname(filename),
                                delete=False) as outfile:
            for line in input_file:
                if not matched(line):
                    print(line, end='', file=outfile)
    os.replace(outfile.name, input_file.name)

main()

score 5 · Accepted Answer

你必须阅读文件尝试类似：

import re
inputfile = open('C:\\temp\\Scripts\\remove.txt','w',encoding="utf8")

inputfile.write(re.sub("\[(.*?)yahoo(.*?)\n","",inputfile.read()))

file.close()
outputfile.close()

python - 使用 Python 删除所有匹配正则表达式的行

3 回答 3

Related

Reference