我修改了丹尼尔的答案,以提供一个稍微整洁的字典:
def xml_to_dictionary(element):
l = len(namespace)
dictionary={}
tag = element.tag[l:]
if element.text:
if (element.text == ' '):
dictionary[tag] = {}
else:
dictionary[tag] = element.text
children = element.getchildren()
if children:
subdictionary = {}
for child in children:
for k,v in xml_to_dictionary(child).items():
if k in subdictionary:
if ( isinstance(subdictionary[k], list)):
subdictionary[k].append(v)
else:
subdictionary[k] = [subdictionary[k], v]
else:
subdictionary[k] = v
if (dictionary[tag] == {}):
dictionary[tag] = subdictionary
else:
dictionary[tag] = [dictionary[tag], subdictionary]
if element.attrib:
attribs = {}
for k,v in element.attrib.items():
attribs[k] = v
if (dictionary[tag] == {}):
dictionary[tag] = attribs
else:
dictionary[tag] = [dictionary[tag], attribs]
return dictionary
namespace 是 xmlns 字符串,包括大括号,ElementTree 将它添加到所有标签之前,所以在这里我已经清除了它,因为整个文档都有一个命名空间
请注意,我也调整了原始 xml,因此“空”标签最多会在 ElementTree 表示中生成一个“”文本属性
spacepattern = re.compile(r'\s+')
mydictionary = xml_to_dictionary(ElementTree.XML(spacepattern.sub(' ', content)))
例如会给
{'note': {'to': 'Tove',
'from': 'Jani',
'heading': 'Reminder',
'body': "Don't forget me this weekend!"}}
它是为特定的xml设计的,基本上相当于json,应该处理元素属性,例如
<elementName attributeName='attributeContent'>elementContent</elementName>
也
有可能合并属性字典/子标签字典,类似于如何合并重复子标签,尽管嵌套列表似乎有点合适:-)