我对 Python 很陌生,所以我很欣赏我的方法可能有点粗糙和准备好了,但任何帮助都会非常受欢迎。
我正在寻找循环遍历 xml 行文件并解析其中一个标签中的日期。我有单独工作的元素;我可以读入文件,遍历它并写入输出文件,另外我也可以单独取一行 xml 并解析它以提取日期。但是,当我尝试通过逐行读取并解析它们来将两者结合起来时,我收到以下错误:
Traceback (most recent call last):
File "./sadpy10.py", line 19, in <module>
DOMTree = xml.dom.minidom.parse(line)
File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 922, in parse
fp = open(file, 'rb')
IOError: [Errno 2] No such file or directory: '<Header><Version>1.0</Version>....<cd:Data>...</Data>..... <cd:DateReceived>20070620171524</cd:DateReceived>'
初始输入文件 (report2.out) 如下,另一个输入文件 (parseoutput.out) 只是删除了每行末尾的大量空白,因为我收到一个 IO 错误,说该行太长:
from xml.dom.minidom import parse
import xml.dom.minidom
import datetime
f = open('report2.out','r')
file = open("parseoutput.out", "w")
for line in f:
# I had to strip the whitespace from end of each line as I was getting error saying the lines were too long
line = line.rstrip()
file.write(line + '\n')
f = open("parseoutput.out","r")
for line in f:
DOMTree = xml.dom.minidom.parse(line)
collection = DOMTree.documentElement
get_date = collection.getElementsByTagName("cd:DateReceived").item(0).firstChild.nodeValue
get_date = datetime.datetime.strptime(get_date, "%Y%m%d%H%M%S").isoformat()
get_date = get_date.replace("T"," ")
print get_date