1

我正在嗅探网络上的数据包并使用 Scapy 和 Python 从原始有效负载中恢复 XML 数据。当我组装框架时,我得到的 XML 数据缺少一些标签。因此,我无法使用 etree.parse() 函数解析 XML 文件。有什么方法可以解析损坏的 XML 文件并使用 XPATH 表达式遍历并获取我想要的数据。

4

1 回答 1

2

我确信我的解决方案过于简单,无法涵盖所有​​情况,但是当缺少结束标签时,它应该能够涵盖简单的情况:

>>> def fix_xml(string):
    """
    Tries to insert missing closing XML tags
    """
    error = True
    while error:
        try:
            # Put one tag per line
            string = string.replace('>', '>\n').replace('\n\n', '\n')
            root = etree.fromstring(string)
            error = False
        except etree.XMLSyntaxError as exc:
            text = str(exc)
            pattern = "Opening and ending tag mismatch: (\w+) line (\d+) and (\w+), line (\d+), column (\d+)"
            m = re.match(pattern, text)
            if m:
                # Retrieve where error took place
                missing, l1, closing, l2, c2 = m.groups()
                l1, l2, c2 = int(l1), int(l2), int(c2)
                lines = string.split('\n')
                print 'Adding closing tag <{0}> at line {1}'.format(missing, l2)
                missing_line = lines[l2 - 1]
                # Modified line goes back to where it was
                lines[l2 - 1] = missing_line.replace('</{0}>'.format(closing), '</{0}></{1}>'.format(missing, closing))
                string = '\n'.join(lines)
            else:
                raise
    print string

这似乎正确地添加了缺少的标签 B 和 C:

>>> s = """<A>
  <B>
    <C>
  </B>
  <B></A>"""
>>> fix_xml(s)
Adding closing tag <C> at line 4
Adding closing tag <B> at line 7
<A>
  <B>
    <C>
  </C>
</B>
  <B>
</B>
</A>
于 2012-09-19T13:21:41.620 回答