python - Python-3.x SIMPLE XPath 库

Question

我正在尝试使用 Python 解析非常简单的 XML。

在 Python 3 之前，我使用带有 XPath 功能的“webscraping”库。工作非常简单：

xpath.search(xml (xml string), "XPath Query (//search)"

- 根据提供的 XPath 查询返回找到的元素。

现在我决定切换到 Python 3，并且上面提到的库不能正常工作（即使在 2to3.py 之后） - 所以我决定使用本机xml.etree.ElementTree库。

可能我不明白什么，但这是正确的噩梦。它在您简单地向函数中提供 XML 和 XPath 查询并返回结果的方式中不起作用。相反，您需要使用 10 多行代码，弄乱元素的子元素等，但它仍然无法正常工作......

import xml.etree.ElementTree as ET
doc = ET.fromstring(xml)
result = doc.findall("//XPath Query")

返回SyntaxError: cannot use absolute path on element 添加.到//XPath Query也没有多大帮助。

是否有某种原因导致库如此复杂ElementTree并且lxml不允许简单地使用 XPATH 而不是弄乱元素、for每次都使用循环等？

任何人都可以推荐只使用 XPath 查询并返回结果的 python 3 简单库吗？

score 2 · Accepted Answer

使用来自http://docs.python.org/2/library/xml.etree.elementtree.html的示例 xml ，搜索似乎工作正常：

>>> import xml.etree.ElementTree as ET
>>> xml = """..."""
>>> doc = ET.fromstring(xml)
>>> doc.findall(".//rank")
[<Element 'rank' at 0x10199ebd0>, <Element 'rank' at 0x10199e210>, <Element 'rank' at 0x10199e4d0>]

或者，如果您想从根目录显式搜索：

>>> ET.ElementTree(doc).findall('//rank')

score 2 · Accepted Answer

现在发现问题了。

我的 XML 响应包含以下内容：

<?xml version="1.0" encoding="utf-8"?>
<GetOrdersResponse xmlns="urn:ebay:apis:eBLBaseComponents">
  <!-- Call-specific Output Fields -->
  <HasMoreOrders> boolean </HasMoreOrders>
  <OrderArray> OrderArrayType
    <Order> OrderType
      <AdjustmentAmount currencyID="CurrencyCodeType"> AmountType (double) </AdjustmentAmount>
      <AmountPaid currencyID="CurrencyCodeType"> AmountType (double) </AmountPaid>
      <AmountSaved currencyID="CurrencyCodeType"> AmountType (double) </AmountSaved>
      <BuyerCheckoutMessage> string </BuyerCheckoutMessage>
      <BuyerUserID> UserIDType (string) </BuyerUserID>
      <CheckoutStatus> CheckoutStatusType
      ...

解析该 XML 后：

root = ET.fromstring(xml)
result = tree.findall("*")

它返回每个带有前缀的单个元素{urn:ebay:apis:eBLBaseComponents}

例如，如果我需要搜索<BuyerCheckoutMessage>

result = tree.findall(".//BuyerCheckoutMessage")它不会返回任何内容，因为该元素看起来像{urn:ebay:apis:eBLBaseComponents}BuyerCheckoutMessage.

因此，要搜索元素，我需要{urn:ebay:apis:eBLBaseComponents}在每个 XPath 查询之前包含以检索我的元素。

所以解决方案是使用：

result = tree.findall(".//{urn:ebay:apis:eBLBaseComponents}BuyerCheckoutMessage") result[0].text将返回元素值。

为什么它不能像 ET.search(xml, "XPath-query") 那样工作是我最大的秘密。浪费了这么多时间。

python - Python-3.x SIMPLE XPath 库

2 回答 2

Related

Reference