我有一个项目需要使用 python 将十几个正则表达式应用于大约 100 个文件。在网上搜索了 4 个多小时的各种组合,包括“(merge|concatenate|stack|join|compile) multiple regex in python”,但我还没有找到任何关于我需要的帖子。
这对我来说是一个中等规模的项目。我需要几个较小的正则表达式项目,它们只需要 5-6 个正则表达式模式,仅应用于十几个文件。虽然这些将对我的工作有很大帮助,但祖父项目是一个应用 100 多个搜索的文件,将字符串替换为我得到的任何新文件。(某些语言的拼写约定不是标准化的,能够快速处理文件将提高工作效率。)
理想情况下,正则表达式字符串需要可由非程序员更新,但这可能超出了本文的范围。
这是我到目前为止所拥有的:
import os, re, sys # Is "sys" necessary?
path = "/Users/mypath/testData"
myfiles = os.listdir(path)
for f in myfiles:
# split the filename and file extension for use in renaming the output file
file_name, file_extension = os.path.splitext(f)
generated_output_file = file_name + "_regex" + file_extension
# Only process certain types of files.
if re.search("txt|doc|odt|htm|html")
# Declare input and output files, open them, and start working on each line.
input_file = os.path.join(path, f)
output_file = os.path.join(path, generated_output_file)
with open(input_file, "r") as fi, open(output_file, "w") as fo:
for line in fi:
# I realize that the examples are not regex, but they are in my real data.
# The important thing, is that each of these is a substitution.
line = re.sub(r"dog","cat" , line)
line = re.sub(r"123", "789" , line)
# Etc.
# Obviously this doesn't work, because it is only writing the last instance of line.
fo.write(line)
fo.close()