21

我有一些像这样的xml片段:

<!DOCTYPE mensaje SYSTEM "record.dtd">
<record>
    <player_birthday>1979-09-23</player_birthday>
    <player_name>Orene Ai'i</player_name>
    <player_team>Blues</player_team>
    <player_id>453</player_id>
    <player_height>170</player_height>
    <player_position>F&W</player_position>   <---- a '&' here.
    <player_weight>75</player_weight>
</record>

有什么方法可以验证 xml 片段是否格式正确?有没有办法根据 DTD 或 XML 方案验证 xml?

由于各种原因,我不能使用任何第三方包。

例如,上面的 xml 不正确,因为它有一个“&”。请注意,DOCTYPE 定义语句是指 DTD。

4

2 回答 2

37

只需尝试使用 ElementTree (xml.etree.ElementTree.fromstring) 解析它 - 如果 XML 格式不正确,它将引发错误。

>>> a = """<record>
...     <player_birthday>1979-09-23</player_birthday>
...     <player_name>Orene Ai'i</player_name>
...     <player_team>Blues</player_team>
...     <player_id>453</player_id>
...     <player_height>170</player_height>
...     <player_position>F&W</player_position>   <---- a '&' here.
...     <player_weight>75</player_weight>
... </record>"""
>>> 
>>> from xml.etree import ElementTree as ET
>>> x = ET.fromstring(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1282, in XML
    parser.feed(text)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1624, in feed
    self._raiseerror(v)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1488, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 7, column 24
于 2012-12-06T11:27:34.437 回答
8

您可以使用 python 的xml.dom.minidomXML 解析器(它在标准库中,但不如 其他替代品强大lxml)。

做就是了:

import xml.dom.minidom
xml.dom.minidom.parseString('<My><XML><String/><XML/><My/>')

xml.parsers.expat.ExpatError如果 XML 无效,您将得到一个。

于 2012-12-06T11:27:15.413 回答