python - 用标签上的前缀解析.xml？xml.etree.ElementTree

Question

我可以阅读标签，除非有前缀。我没有运气搜索上一个问题。

我需要阅读media:content。我试过了image = node.find("media:content")。RSS输入：

<channel>
  <title>Popular  Photography in the last 1 week</title>
  <item>
    <title>foo</title>
    <media:category label="Miscellaneous">photography/misc</media:category>
    <media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
  </item>
  <item> ... </item>
</channel>

我可以阅读兄弟标签title。

from xml.etree import ElementTree
with open('cache1.rss', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.findall('.//channel/item'):
    title =  node.find("title").text

我一直在使用文档，但仍停留在“前缀”部分。

score 5 · Accepted Answer

这是一个将 XML 命名空间与ElementTree一起使用的示例：

>>> x = '''\
<channel xmlns:media="http://www.w3.org/TR/html4/">
  <title>Popular  Photography in the last 1 week</title>
  <item>
    <title>foo</title>
    <media:category label="Miscellaneous">photography/misc</media:category>
    <media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
  </item>
  <item> ... </item>
</channel>
'''
>>> node = ElementTree.fromstring(x)
>>> for elem in node.findall('item/{http://www.w3.org/TR/html4/}category'):
        print elem.text


photography/misc

score 0 · Accepted Answer

media是一个 XML 命名空间，它必须在前面用xmlns:media="...". 请参阅http://lxml.de/xpathxslt.html#namespaces-and-prefixes了解如何定义 xml 命名空间以用于 lxml 中的 XPath 表达式。

python - 用标签上的前缀解析.xml？xml.etree.ElementTree

2 回答 2

Related

Reference