python - 使用 Python 解析 XML 时针对特定子元素

Question

我正在构建一个简单的解析器来处理工作中的常规数据馈送。这篇文章，XML to csv(-like) format，非常有帮助。我在解决方案中使用 for 循环来遍历我需要定位的所有元素/子元素，但我仍然有点卡住。

例如，我的 xml 文件的结构如下：

<root>
  <product>
    <identifier>12</identifier>
    <identifier>ab</identifier>
    <contributor>Alex</contributor>
    <contributor>Steve</contributor>
  </product>
<root>

我只想针对第二个标识符，并且只针对第一个贡献者。关于我该怎么做的任何建议？

干杯!

score 0 · Accepted Answer

您指出的另一个答案有一个如何将标签的所有实例转换为列表的示例。您可以遍历这些并丢弃您不感兴趣的那些。

但是，有一种方法可以直接使用 XPath 执行此操作：迷你语言支持括号中的项目索引：

import xml.etree.ElementTree as etree
document = etree.parse(open("your.xml"))

secondIdentifier = document.find(".//product/identifier[2]")
firstContributor = document.find(".//product/contributor[1]")
print secondIdentifier, firstContributor

印刷

'ab', 'Alex'

请注意，在 XPath 中，第一个索引是1，而不是0。

ElementTreefind仅findall支持 XPath 的一个子集，此处描述。完整的 XPath，在W3Schools上进行了简要描述，在W3C 的规范性文档中进行了更全面的描述，可从lxml获得，这是一个第三方包，但可以广泛使用。使用 lxml，示例将如下所示：

import lxml.etree as etree
document = etree.parse(open("your.xml"))

secondIdentifier = document.xpath(".//product/identifier[2]")[0]
firstContributor = document.xpath(".//product/contributor[1]")[0]
print secondIdentifier, firstContributor

python - 使用 Python 解析 XML 时针对特定子元素

1 回答 1

Related

Reference