python - lxml[.objectify] 文档元素标签名

Question

我正在接收 XML 格式的数据包，每个数据包都有一个特定的 documentRoot 标签，我想根据根标签名称委托专门的方法来处理这些数据包。这适用于 xml.dom.minidom，如下所示：

dom = minidom.parseString(the_data)
root = dom.documentElement
deleg = getattr(self,'elem_' + str(root.tagName))
deleg(dom)

但是，我想通过使用更 Pythonic 的 lxml.objectify 来简化事情（在代码的其他部分，而不是这里）。

问题是我不知道如何用 lxml 获取“root.tagName”，最好严格来说是 lxml.objectify。有任何想法吗？

score 3 · Accepted Answer

在lxml 文档和 dir() built_in 的帮助下，我设法产生了这个：

>>> from lxml import objectify
>>> import StringIO
>>> tree = objectify.parse(StringIO.StringIO('<parent><child>Billy</child><child>Bob</child></parent>'))
>>> root = tree.getroot()
>>> root.tag
'parent'
>>> [(foo.tag, foo.text) for foo in root.getchildren()]
[('child', 'Billy'), ('child', 'Bob')]
>>>

看起来你需要类似的东西

deleg = getattr(self,'elem_' + str(root.tag))
deleg(tree)

score 0 · Accepted Answer

在Amara Bindery中的 FWIW，您可以执行以下操作：

from amara import bindery
doc = bindery.parse(the_data)
top_elem = doc.xml_elements.next()
deleg = getattr(self, 'elem_' + str(top_elem.xml_qname))
deleg(doc)

你也会得到一个 Pythonic API，例如：doc.html.head.title = u"Change HTML document title"

python - lxml[.objectify] 文档元素标签名

2 回答 2

Related

Reference