python - 用python过滤xml

Question

我有以下 xml 文档：

<node0>
    <node1>
      <node2 a1="x1"> ... </node2>
      <node2 a1="x2"> ... </node2>
      <node2 a1="x1"> ... </node2>
    </node1>
</node0>

我想过滤掉node2什么时候a1="x2"。用户提供需要测试和过滤掉的 xpath 和属性值。我查看了 Python 中的一些解决方案，例如 BeautifulSoup，但它们太复杂并且不保留文本的大小写。我想保持文档和以前一样，过滤掉一些东西。

你能推荐一个简单而简洁的解决方案吗？从外观上看，这不应该太复杂。实际的xml文档并不像上面那么简单，但是思路是一样的。

score 7 · Accepted Answer

这使用xml.etree.ElementTree标准库中的内容：

import xml.etree.ElementTree as xee
data='''\
<node1>
  <node2 a1="x1"> ... </node2>
  <node2 a1="x2"> ... </node2>
  <node2 a1="x1"> ... </node2>
</node1>
'''
doc=xee.fromstring(data)

for tag in doc.findall('node2'):
    if tag.attrib['a1']=='x2':
        doc.remove(tag)
print(xee.tostring(doc))
# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

这使用lxml，它不在标准库中，但具有更强大的语法：

import lxml.etree
data='''\
<node1>
  <node2 a1="x1"> ... </node2>
  <node2 a1="x2"> ... </node2>
  <node2 a1="x1"> ... </node2>
</node1>
'''
doc = lxml.etree.XML(data)
e=doc.find('node2/[@a1="x2"]')
doc.remove(e)
print(lxml.etree.tostring(doc))

# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

编辑：如果node2在 xml 中埋得更深，那么您可以遍历所有标签，检查每个父标签以查看该node2元素是否是其子元素之一，如果是则将其删除：

仅使用 xml.etree.ElementTree：

doc=xee.fromstring(data)
for parent in doc.getiterator():
    for child in parent.findall('node2'):
        if child.attrib['a1']=='x2':
            parent.remove(child)

使用 lxml：

doc = lxml.etree.XML(data)
for parent in doc.iter('*'):
    child=parent.find('node2/[@a1="x2"]')
    if child is not None:
        parent.remove(child)

python - 用python过滤xml

1 回答 1

Related

Reference