python - Python XML XPath 部分失败消息

Question

在一些代码中，我维护使用 minidom 库进行 XML 解析。

对于类似于以下的 XML 结构：

<a val="a1">
  <b val="b1">
    <c val="c1">
      Data
    </c>
  </b>
</a>

代码如下：

for a in doc.getElementsByTagName("a"):
    aId = a.getAttribute("val").encode('ascii')
    if aId == aExpected:
        aFound = a
        break
else: # not found
    raise Exception("No A '%s' found" % aExpected)
for b in aFound.getElementsByTagName("b"):
    bId = b.getAttribute("val").encode('ascii')
    if bId == bExpected:
        bFound = b
        break
else: # not found
    raise Exception("No B '%s' found" % bExpected)
# similar for c

我想使用 XPath 来查找数据。我可以用（ElementTree）做到这一点：

root.findall(".//a[@val=%s]/b[@val=%s]/c[@val=%s]" % (aExpected, bExpected, cExpected))

代码现在看起来好多了。但是，当在 XML 中找不到数据时， findall() 返回 None 并且我必须手动分析文件以查找第一个不匹配的元素。

ElementTree（或其他 XML API）中是否有可能同时使用 XPath 并让 XPath 返回匹配失败的第一个点（类似于原始代码中的 else 子句）？

正如在一个答案中指出的那样，代码可以替换为：

aFound = root.find(".//a[@val=%r]" % (aExpected,))
if not aFound:
    raise("A not present")
bFound = aFound.find("b[@val=%r]" % (bExpected,))
if not bFound:
    raise("B not present")
cFound = bFound.find("c[@val=%r]" % (cExpected,))
if not cFound:
    raise("C not present")

是的，这绝对比原版更干净，但我正在寻找一个图书馆，它将向我提供这些信息。

score 0 · Accepted Answer

对于以下 xml

    <a val="a1">
  <b val="b1">
    <c val="c1">
      Data
    </c>
  </b>
</a>

工作此代码

import xml.etree.ElementTree as ET

file = "sample.xml"
aExpected = "a1"
bExpected = "b1"
cExpected = "c1"

tree = ET.parse(file)
root = tree.getroot()

bFound = root.find("./b[@val='" + bExpected + "']")
cFound = root.find(".//c[@val='" + cExpected + "']")

print(root)
print(bFound)
print(cFound)

输出是：

<Element 'a' at 0x02919B10>
<Element 'b' at 0x02919BD0>
<Element 'c' at 0x02919C30>

xml.etree.ElementTree 通过 XPath 找不到任何内容，因为 a 是根元素

如果要查找 a 元素，请按以下方式修改 xml

  <root>
<a val="a1">
  <b val="b1">
    <c val="c1">
      Data
    </c>
  </b>
</a>
</root>

和代码

 import xml.etree.ElementTree as ET

file = "sample.xml"
aExpected = "a1"
bExpected = "b1"
cExpected = "c1"

tree = ET.parse(file)
root = tree.getroot()

aFound = root.find("./a[@val='" + aExpected + "']")
bFound = root.find(".//b[@val='" + bExpected + "']")
cFound = root.find(".//c[@val='" + cExpected + "']")

print(aFound)
print(bFound)
print(cFound)

结果将是

<Element 'a' at 0x02919B10>
<Element 'b' at 0x02919BD0>
<Element 'c' at 0x02919C30>

此致

score 0 · Accepted Answer

aFound = root.findall(".//a[@val=%r]" % (aExpected,))[0]
bFound = aFound.findall("b[@val=%r]" % (bExpected,))[0]
cFound = bFound.findall("c[@val=%r]" % (cExpected,))[0]

在没有找到元素的第一行将引发 IndexError。

或者，为了避免在只需要一个元素时找到所有元素，请使用find：

aFound = root.find(".//a[@val=%r]" % (aExpected,))
bFound = aFound.find("b[@val=%r]" % (bExpected,))
cFound = bFound.find("c[@val=%r]" % (cExpected,))

现在，AttributeError（因为NoneType没有find方法）将在没有找到元素的那一行之后引发。

python - Python XML XPath 部分失败消息

2 回答 2

Related

Reference