python - 使用 minidom.toprettyxml 时出现空行

Question

我一直在使用 minidom.toprettyxml 来美化我的 xml 文件。当我创建 XML 文件并使用此方法时，一切正常，但如果我在修改 xml 文件后使用它（例如，我添加了一个额外的节点），然后我将它写回 XML ，我的空行，每次更新，我的空行越来越多...

我的代码：

file.write(prettify(xmlRoot))


def prettify(elem):
    rough_string = xml.tostring(elem, 'utf-8') //xml as ElementTree
    reparsed = mini.parseString(rough_string) //mini as minidom
    return reparsed.toprettyxml(indent=" ")

结果：

<?xml version="1.0" ?>
<testsuite errors="0" failures="3" name="TestSet_2013-01-23 14_28_00.510935" skip="0"     tests="3" time="142.695" timestamp="2013-01-23 14:28:00.515460">




    <testcase classname="TC test" name="t1" status="Failed" time="27.013"/>




    <testcase classname="TC test" name="t2" status="Failed" time="78.325"/>


    <testcase classname="TC test" name="t3" status="Failed" time="37.357"/>
</testsuite>

有什么建议么？

谢谢。

score 29 · Accepted Answer

我在这里找到了解决方案：http: //code.activestate.com/recipes/576750-pretty-print-xml/

然后我将其修改为采用字符串而不是文件。

from xml.dom.minidom import parseString

pretty_print = lambda data: '\n'.join([line for line in parseString(data).toprettyxml(indent=' '*2).split('\n') if line.strip()])

输出：

<?xml version="1.0" ?>
<testsuite errors="0" failures="3" name="TestSet_2013-01-23 14_28_00.510935" skip="0" tests="3" time="142.695" timestamp="2013-01-23 14:28:00.515460">
  <testcase classname="TC test" name="t1" status="Failed" time="27.013"/>
  <testcase classname="TC test" name="t2" status="Failed" time="78.325"/>
  <testcase classname="TC test" name="t3" status="Failed" time="37.357"/>
</testsuite>

这可能会帮助您更轻松地将其融入您的功能：

def new_prettify():
    reparsed = parseString(CONTENT)
    print '\n'.join([line for line in reparsed.toprettyxml(indent=' '*2).split('\n') if line.strip()])

score 6 · Accepted Answer

我为这个问题找到了一个简单的解决方案，只需更改 prettify() 的最后一行，它将是：

def prettify(elem):
rough_string = xml.tostring(elem, 'utf-8') //xml as ElementTree
reparsed = mini.parseString(rough_string) //mini as minidom
return reparsed.toprettyxml(indent=" ", newl='')

score 2 · Accepted Answer

2

用它来解决线条问题

toprettyxml(indent=' ', newl='\r', encoding="utf-8")

于 2015-07-07T22:37:23.800 回答

score 1 · Accepted Answer

我在 Windows 10 机器上遇到了与 Python 2.7 (32b) 相同的问题。问题似乎在于，当 python 将 XML 文本解析为 ElementTree 对象时，它会在每个元素的“文本”或“尾部”属性中添加一些烦人的换行符。

此脚本删除此类换行符：

def removeAnnoyingLines(elem):
    hasWords = re.compile("\\w")
    for element in elem.iter():
        if not re.search(hasWords,str(element.tail)):
            element.tail=""
        if not re.search(hasWords,str(element.text)):
            element.text = ""

在“漂亮打印”您的树之前使用此功能：

removeAnnoyingLines(element)
myXml = xml.dom.minidom.parseString(xml.etree.ElementTree.tostring(element))
print myXml.toprettyxml()

它对我有用。我希望这个对你有用！

score 1 · Accepted Answer

这是一个 Python3 解决方案，它摆脱了丑陋的换行问题（大量空白），并且它只使用标准库，与大多数其他实现不同。

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

我在这里找到了如何解决常见的换行问题。

score 0 · Accepted Answer

问题是 minidom 不能很好地处理新行字符（在 Windows 上）。无论如何，它不需要它们，因此将它们从刺痛中移除是解决方案：

reparsed = mini.parseString(rough_string) //mini as minidom

用。。。来代替

reparsed = mini.parseString(rough_string.replace('\n','')) //mini as minidom

但请注意，这是仅适用于 Windows 的解决方案。

score 0 · Accepted Answer

由于 minidom toprettyxml 插入了太多行，我的解决方案是通过检查是否至少有一个“<”字符来删除其中没有有用数据的行（可能有更好的主意）。这对于我遇到的类似问题（在 Windows 上）非常有效。

text = md.toprettyxml() # get the prettyxml string from minidom Document md
# text = text.replace('    ', '\t') # for those using tabs :)
spl = text.split('\n') # split lines into a list
spl = [i for i in spl if '<' in i] # keep only element with data inside
text = '\n'.join(spl) # join again all elements of the filtered list into a string

# write the result to file (I use codecs because I needed the utf-8 encoding)
import codecs # if not imported yet (just to show this import is needed)
with codecs.open('yourfile.xml', 'w', encoding='utf-8') as f:
    f.write(text)

python - 使用 minidom.toprettyxml 时出现空行

7 回答 7

Related

Reference