xml - 使用 xml.dom.minidom 文档的 Python 问题。使用 toprettyxml() 的子元素之间的额外空行

Question

请多多包涵，因为我对 python（以及更大的编程社区）非常陌生，但我一直在由一位比我更有经验的同事指导。我们正在尝试编写一个读取 XML 的 python 脚本文件并挑选数据的某些部分，编辑一些变量值，然后重新组合 XML。我们遇到的问题是数据在使用 toprettyxml() 传递回新的 do 时被格式化的方式

基本上，文件的上半部分有一堆我们根本不需要修改的元素，所以我们试图完全抓取这些元素，然后在我们重新组合时将它们附加到根。同一页面上同一级别的一些较低元素被挑选成内存中的较小项目，并在最低的子级别重新组合。那些被手动重新组装和附加的工作正常。

所以这里应该大致是相关的代码位：

def __handleElemsWithAtrributes(elem):
    #returns empty element with all attributes of source element
    tmpDoc = Document()
    result = tmpDoc.createElement(elem.item(0).tagName)
    attr_map = elem.item(0).attributes
    for i in range(attr_map.length):
        result.setAttribute(attr_map.item(i).name,attr_map.item(i).value)
    return result

def __getWholeElement(elems):
    #returns element with all attributes of source element and all contents
    if len(elems) == 0:
        return 0
    temp = Document()
    for e in elems:
        result = temp.createElement(e.tagName)
        attr_map = e.attributes
        for i in range(attr_map.length):
            result.setAttribute(attr_map.item(i).name,attr_map.item(i).value)
        result = e
    return result


def __init__():
      ##A bunch of other stuff I'm leaving out...
                f = xml.dom.minidom.parse(pathToFile)
                doc = Document()

                modules = f.getElementsByTagName("Module")
                descriptions = f.getElementsByTagName("Description")
                steptree = f.getElementsByTagName("StepTree")
                reference = f.getElementsByTagName("LessonReference")

                mod_val = __handleElemsWithAtrributes(modules)
                des_val = __getWholeElement(descriptions)
                step_val = __getWholeElement(steptree)
                ref_val = __getWholeElement(reference)

                if des_val != 0 and mod_val != 0 and step_val != 0 and ref_val != 0:
                    mod_val.appendChild(des_val)
                    mod_val.appendChild(step_val)
                    mod_val.appendChild(ref_val)
                    doc.appendChild(mod_val)
               o.write(doc.toprettyxml())

不，这里的标签没有准确地保留，因为我从几个不同的区域复制了，但我相信你明白了要点。

基本上，我使用的输入看起来像这样：

<Module aatribute="" attribte2="" attribute3="" >
<Description>
    <Title>SomeTitle</Title>
    <Objective>An objective</Objective>
    <Action>
        <Familiarize>familiarize text</Familiarize>
    </Action>
    <Condition>
        <Familiarize>Condition text</Familiarize>
    </Condition>
    <Standard>
        <Familiarize>Standard text</Familiarize>
    </Standard>
    <PerformanceMeasures>
        <Measure>COL text</Measure>
    </PerformanceMeasures>
    <TMReferences>
        <Reference>Reference text</Reference> 
    </TMReferences>
</Description>

然后当它重新组装时，它看起来像这样：

<Module aatribute="" attribte2="" attribute3="" >
<Description>


    <Title>SomeTitle</Title>


    <Objective>An objective</Objective>


    <Action>


        <Familiarize>familiarize text</Familiarize>


    </Action>


    <Condition>


        <Familiarize>Condition text</Familiarize>


    </Condition>


    <Standard>


        <Familiarize>Standard text</Familiarize>


    </Standard>


    <PerformanceMeasures>


        <Measure>COL text</Measure>


    </PerformanceMeasures>


    <TMReferences>


        <Reference>Reference text</Reference> 


    </TMReferences>


</Description>

如何让它停止制作所有额外的空行？有任何想法吗？

score 2 · Accepted Answer

我有同样的问题。问题是，每次 Python 跳行时，它都会在你的树中为它添加一个 textNode。因此，topprettyxml()这是一个非常恶毒的功能，因为它会在您不知道的情况下将节点添加到您的树中。

其中一个解决方案是在开始解析文件时找到一种方法来擦除所有无用的文本节点（我现在正在寻找它，仍然没有找到“漂亮”的解决方案）。

逐个节点删除：

def cleanUpNodes(nodes):
    for node in nodes.childNodes:
        if node.nodeType == Node.TEXT_NODE:
            node.data = ''
    nodes.normalize()

来自http://mail.python.org/pipermail/xml-sig/2004-March/010191.html

score -1 · Accepted Answer

谢谢你这递归工作！

def cleanUpNodes(self,nodes):
        for node in nodes.childNodes:
            if node.nodeType == node.TEXT_NODE and (node.data.startswith('\t') or node.data.startswith('\n') or node.data.startswith('\r') ):
                node.data = ''
            if node.nodeType == node.ELEMENT_NODE:
                self.cleanUpNodes(node)
        nodes.normalize()

xml - 使用 xml.dom.minidom 文档的 Python 问题。使用 toprettyxml() 的子元素之间的额外空行

2 回答 2

Related

Reference