我正在嗅探网络上的数据包并使用 Scapy 和 Python 从原始有效负载中恢复 XML 数据。当我组装框架时,我得到的 XML 数据缺少一些标签。因此,我无法使用 etree.parse() 函数解析 XML 文件。有什么方法可以解析损坏的 XML 文件并使用 XPATH 表达式遍历并获取我想要的数据。
问问题
529 次
1 回答
2
我确信我的解决方案过于简单,无法涵盖所有情况,但是当缺少结束标签时,它应该能够涵盖简单的情况:
>>> def fix_xml(string):
"""
Tries to insert missing closing XML tags
"""
error = True
while error:
try:
# Put one tag per line
string = string.replace('>', '>\n').replace('\n\n', '\n')
root = etree.fromstring(string)
error = False
except etree.XMLSyntaxError as exc:
text = str(exc)
pattern = "Opening and ending tag mismatch: (\w+) line (\d+) and (\w+), line (\d+), column (\d+)"
m = re.match(pattern, text)
if m:
# Retrieve where error took place
missing, l1, closing, l2, c2 = m.groups()
l1, l2, c2 = int(l1), int(l2), int(c2)
lines = string.split('\n')
print 'Adding closing tag <{0}> at line {1}'.format(missing, l2)
missing_line = lines[l2 - 1]
# Modified line goes back to where it was
lines[l2 - 1] = missing_line.replace('</{0}>'.format(closing), '</{0}></{1}>'.format(missing, closing))
string = '\n'.join(lines)
else:
raise
print string
这似乎正确地添加了缺少的标签 B 和 C:
>>> s = """<A>
<B>
<C>
</B>
<B></A>"""
>>> fix_xml(s)
Adding closing tag <C> at line 4
Adding closing tag <B> at line 7
<A>
<B>
<C>
</C>
</B>
<B>
</B>
</A>
于 2012-09-19T13:21:41.620 回答