今天我偶然发现了xml.dom
andxpath
模块的一种特殊行为,我花了一段时间才弄清楚它与 XML 命名空间有关:
from xml.dom import minidom
import xpath
zooXml = """<?xml version="1.0" encoding="utf-8"?>
<Zoo xmlns='http://foo.bar/zoo'>
<Compound><Chimp/></Compound>
</Zoo>"""
mydom = minidom.parseString(zooXml)
compound = xpath.findnode('/Zoo/Compound', mydom)
print compound.toxml() # as expected: <Compound><Chimp/></Compound>
print xpath.find("Chimp", compound) # as expected: [<DOM Element: Chimp at 0x24c0cc8>]
到目前为止一切顺利,但如果我现在添加另一个Chimp
元素而不明确指定其命名空间,xpath
将找不到新元素:
newChimp = mydom.createElement("Chimp")
compound.appendChild(newChimp)
print compound.toxml() # ok, two chimps now: <Compound><Chimp/><Chimp/></Compound>
print xpath.find("Chimp", compound) # wait a second, that's still only one chimp: [<DOM Element: Chimp at 0x24a0d88>]
重新解析修改后的 XML 后,xpath 会找到这两个元素:
mydom = minidom.parseString(mydom.toxml())
compound = xpath.findnode('/Zoo/Compound', mydom)
print xpath.find("Chimp", compound) # now it finds both chimps: [<DOM Element: Chimp at 0x24c9808>, <DOM Element: Chimp at 0x24c9888>]
此外,如果我使用命名空间创建新元素,xpath
将在不重新解析的情况下找到它们:
babyChimp = mydom.createElementNS(mydom.firstChild.namespaceURI, "Chimp")
compound.appendChild(babyChimp)
print xpath.find("Chimp", compound) # that worked: [<DOM Element: Chimp at 0x24c9808>, <DOM Element: Chimp at 0x24c9888>, <DOM Element: Chimp at 0x24c9548>]
问题是:这种行为是正确的还是一个错误?的命名空间不应该Chimp
是隐式的吗?毕竟生成的 XML 都是一样的,不管我用的xml.dom.createElement()
是xml.dom.createElementNS()
. 如果这是一个错误,那么它在哪里?在xml.dom
还是在xpath
?
FWIW:我在 Python 2.7.5 和 2.7.4 的 Windows 发行版中观察到了这种行为,在这两种情况下我都使用了xpath模块 0.1。