python - 强制属性存在于 XML 中的标签中

Question

如果没有特定属性，如何在解析 XML 文档时删除所有标签？例如，我希望所有标签（当然除了根）都具有名称属性。我正在使用 XML 来拥有树数据库，并且没有名称的标签根本没有意义。

当然，我可以（深度）遍历所有标签并检查属性是否存在，但是对于更大的文件需要一些时间。

我想应该有一些选项可以用 XMLParser 来做......也许使用一些模式？

score 0 · Accepted Answer

使用 XPath 和 lxml，这应该可以工作：

from lxml import etree

xml = etree.XML("<root><a name='1'><b name='1-1'>ABC</b></a><a>Does not exist</a><a name='2'>DEF</a><a><b name='3-1'>GHI</b></a></root>")

print 'Before:'
print etree.tostring(xml)

xp = etree.XPath("/*/*[not(@name)]") # or "//*[not(@name)]" to include the root tag
all_nodes = xp(xml)
for x in all_nodes:
    parent = x.getparent()
    #if parent is None: continue # if the root tag is included, the parent is None
    parent.remove(x)

print 'After:'
print etree.tostring(xml)

score 0 · Accepted Answer

在 XSLT 中非常容易。两个模板规则，一个复制所有内容的身份规则：

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

和另一个丢弃你不想要的元素的规则：

<xsl:template match="*[not(@specific-attribute)]"/>

python - 强制属性存在于 XML 中的标签中

2 回答 2

Related

Reference