我是 Python 新手。我第二次在里面编码。该脚本的主要目的是获取一个包含数千行文件名的文本文件(sNotUsed 文件)并将其与大约 50 个 XML 文件进行匹配。每个 XML 文件可能包含多达数千行,并且按照大多数 XML 的格式进行格式化。我不确定到目前为止代码的问题是什么。代码不完全完整,因为我没有添加将输出写回 XML 文件的部分,但当前的最后一行应该至少打印一次。但事实并非如此。
两种文件格式的示例如下:
文本文件:
fileNameWithoutExtension1
fileNameWithoutExtension2
fileNameWithoutExtension3
etc.
XML 文件:
<blocks>
<more stuff="name">
<Tag2>
<Tag3 name="Tag3">
<!--COMMENT-->
<fileType>../../dir/fileNameWithoutExtension1</fileType>
<fileType>../../dir/fileNameWithoutExtension4</fileType>
</blocks>
到目前为止我的代码:
import os
import re
sNotUsed=list()
sFile = open("C:\Users\xxx\Desktop\sNotUsed.txt", "r") # open snotused txt file
for lines in sFile:
sNotUsed.append(lines)
#sNotUsed = sFile.readlines() # read all lines and assign to list
sFile.close() # close file
xmlFiles=list() # list of xmlFiles in directory
usedS=list() # list of S files that do not match against sFile txt
search = "\w/([\w\-]+)"
# getting the list of xmlFiles
filelist=os.listdir('C:\Users\xxx\Desktop\dir')
for files in filelist:
if files.endswith('.xml'):
xmlFile = open(files, "r+") # open first file with read + write access
xmlComp = xmlFile.readlines() # read lines and assign to list
for lines in xmlComp: # iterate by line in list of lines
temp = re.findall(search, lines)
#print temp
if temp:
if temp[0] in sNotUsed:
print "yes" # debugging. I know there is at least one match for sure, but this is not being printed.
帮助澄清事情: 对不起,我想我的问题不是很清楚。我希望脚本逐行遍历每个 XML,并查看该行的 FILENAME 部分是否与 sNotUsed.txt 文件的确切行匹配。如果有匹配,那么我想从 XML 中删除它。如果该行与 sNotUsed.txt 中的任何行都不匹配,那么我希望它成为新修改的 XML 文件输出的一部分(这将覆盖旧文件)。如果仍然不清楚,请告诉我。
编辑,工作代码
import os
import re
import codecs
sFile = open("C:\Users\xxx\Desktop\sNotUsed.txt", "r") # open sNotUsed txt file
sNotUsed=sFile.readlines() # read all lines and assign to list
sFile.close() # close file
search = re.compile(r"\w/([\w\-]+)")
sNotUsed=[x.strip().replace(',','') for x in sNotUsed]
directory=r'C:\Users\xxx\Desktop\dir'
filelist=os.listdir(directory) # getting the list of xmlFiles
# for each file in the list
for files in filelist:
if files.endswith('.xml'): # make sure it is an XML file
xmlFile = codecs.open(os.path.join(directory, files), "r", encoding="UTF-8") # open first file with read
xmlComp = xmlFile.readlines() # read lines and assign to list
print xmlComp
xmlFile.close() # closing the file since the lines have already been read and assigned to a variable
xmlEdit = codecs.open(os.path.join(directory, files), "w", encoding="UTF-8") # opening the same file again and overwriting all existing lines
for lines in xmlComp: # iterate by line in list of lines
#headerInd = re.search(search, lines) # used to get the headers, comments, and ending blocks
temp = re.findall(search, lines) # finds all strings that match the regular expression compiled above and makes a list for each
if temp: # if the list is not empty
if temp[0] not in sNotUsed: # if the first (and only) value in each list is not in the sNotUsed list
xmlEdit.write(lines) # write it in the file
else: # if the list is empty
xmlEdit.write(lines) # write it (used to preserve the beginning and ending blocks of the XML, as well as comments)