python - 使用 Python 从文本文件创建 xml 树

Question

解析文本文件时，我需要避免在 xml 树中创建双分支。假设文本文件如下（行的顺序是随机的）：

分支 1：分支 11：消息 11 分支 1：分支12：消息 12 分支 2：分支
21：消息 21 分支 2：分支 22：
消息
22

所以生成的 xml 树应该有一个有两个分支的根。这两个分支都有两个子分支。我用来解析这个文本文件的 Python 代码如下：

import string
fh = open ('xmlbasic.txt', 'r')
allLines = fh.readlines()
fh.close()
import xml.etree.ElementTree as ET
root = ET.Element('root')

for line in allLines:
   tempv = line.split(':')
   branch1 = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

tree = ET.ElementTree(root)
tree.write('xmlbasictree.xml')

这段代码的问题是，xml 树中的一个分支是用文本文件中的每一行创建的。

如果已经存在具有此名称的分支，有什么建议如何避免在 xml 树中创建另一个分支？

score 1 · Accepted Answer

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

import xml.etree.ElementTree as ET

root = ET.Element('root')

for line in lines:
    head, subhead, tail = line.split(":")

    head_branch = root.find(head)
    if not head_branch:
        head_branch = ET.SubElement(root, head)

    subhead_branch = head_branch.find(subhead)
    if not subhead_branch:
        subhead_branch = ET.SubElement(branch1, subhead)

    subhead_branch.text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

逻辑很简单-您已经在问题中说明了！您只需要在创建之前检查树中是否已经存在分支。

请注意，这可能效率低下，因为您要为每一行搜索整个树。这是因为ElementTree不是为唯一性而设计的。

如果您需要速度（您可能不需要，尤其是对于小树！），更有效的方法是使用 adefaultdict来存储树结构，然后再将其转换为ElementTree.

import collections
import xml.etree.ElementTree as ET

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

root_dict = collections.defaultdict( dict )
for line in lines:
    head, subhead, tail = line.split(":")
    root_dict[head][subhead] = tail

root = ET.Element('root')
for head, branch in root_dict.items():
    head_element = ET.SubElement(root, head)
    for subhead, tail in branch.items():
        ET.SubElement(head_element,subhead).text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

score 0 · Accepted Answer

这些方面的东西？您保持分支的级别以在字典中重用。

b1map = {}

for line in allLines:
   tempv = line.split(':')
   branch1 = b1map.get(tempv[0])
   if branch1 is None:
       branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

python - 使用 Python 从文本文件创建 xml 树

2 回答 2

Related

Reference