python - 使用 lxml，如何找到父节点的兄弟节点？

Question

XML 不断向我抛出曲线球。我很难找到我能理解的手册。因此，对于过去几天的所有问题，我深表歉意。

无论如何，我有以下 XML：

      <clade>
        <clade>
          <branch_length>0.5</branch_length>
          <clade>
            <name>MnPV1</name>
            <annotation>
<desc>Iotapapillomavirus 1</desc></annotation><chart><group>Iota</group></chart><branch_length>1.0</branch_length>
          </clade>
          <clade>

我想将其更改为：

  <clade>
    <clade>
      <branch_length>0.5</branch_length>
      <clade>
        <name bgstyle="green">MnPV1</name>
        <annotation><desc>Iotapapillomavirus 1</desc><uri>http://pave.niaid.nih.gov/#fetch?id=MnPV1REF&amp;format=Locus%20view&amp;hasStructure=none</uri></annotation><chart><group>Iota</group></chart><branch_length>1.0</branch_length>
      </clade>
      <clade>

所以我想改变：

<name>MnPV1</name>

到：

<name bgstyle="green">MnPV1</name>

问题是，我正在寻找是否：

tree.xpath('//phylo:group[text()="Iota"]

如果是，我想获得“组”节点的“叔叔”，这样我就可以编辑“名称”节点

到目前为止，这是我想出的：

tree = lxml.etree.XML(data)
nsmap = {'phylo': 'http://www.phyloxml.org'}
matches = tree.xpath('//phylo:group[text()="Iota"]', namespaces=nsmap)

for e in matches:
    uncle=e.getparent().getsibling() #however, getsibling() does not exist...

我将不胜感激任何帮助（和/或针对傻瓜的 lxml 建议）。

score 5 · Accepted Answer

这个怎么样？

>>> data = r'''<clade>
...  <name>MnPV1</name>
...  <annotation>
...    <desc>Iotapapillomavirus 1</desc>
...  </annotation>
...  <chart>
...    <group>Iota</group>
...  </chart>
...  <branch_length>1.0</branch_length>
... </clade>'''
...
>>> tree = lxml.etree.XML(data)
>>> for name in tree.xpath('//group[text()="Iota"]/../preceding-sibling::name'):
...   name.attrib['bgstyle'] = 'green'
...
>>> print lxml.etree.tostring(tree, pretty_print=True)
<clade>
 <name bgstyle="green">MnPV1</name>
 <annotation>
   <desc>Iotapapillomavirus 1</desc>
 </annotation>
 <chart>
   <group>Iota</group>
 </chart>
 <branch_length>1.0</branch_length>
</clade>

>>>

诀窍是使用 XML 工具（例如，XPath 和 XSLT）来操作 XML 文档。w3schools 网站是很好的起点。XPath 本身就相当强大，一旦你掌握了它的窍门，它就非常易读。不过，这类问题最好使用 XSLT 解决。如果您要处理一堆 XML，请帮自己一个大忙，购买一份Oxygen XML 编辑器或类似的东西。

如果您正在寻找使用更少 XPath 和更多 Python 的东西，那么使用getparent后跟调用getprevious. 我不确定支持的程度getparent和效果如何getprevious，但是它们已记录在案并且可以正常工作。

score 2 · Accepted Answer

这是一个简单的 XSLT 解决方案：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="clade[chart/group='Iota']/name">
  <name bgstyle="green"><xsl:apply-templates/></name>
 </xsl:template>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档时：

<clade>
    <clade>
        <branch_length>0.5</branch_length>
        <clade>
            <name>MnPV1</name>
            <annotation>
                <desc>Iotapapillomavirus 1</desc>
            </annotation>
            <chart>
                <group>Iota</group>
            </chart>
            <branch_length>1.0</branch_length>
        </clade>
    </clade>
</clade>

产生了想要的正确结果：

<clade>
   <clade>
      <branch_length>0.5</branch_length>
      <clade>
         <name bgstyle="green">MnPV1</name>
         <annotation>
            <desc>Iotapapillomavirus 1</desc>
         </annotation>
         <chart>
            <group>Iota</group>
         </chart>
         <branch_length>1.0</branch_length>
      </clade>
   </clade>
</clade>

解释：

使用和覆盖身份规则，它“按原样”复制每个节点（为其选择执行）。
简单的覆盖模板匹配所需name元素并将其替换为具有所需新属性的同名元素。

python - 使用 lxml，如何找到父节点的兄弟节点？

2 回答 2

Related

Reference