2

lxml FAQs中,它们提供了以下内容:

如何将 XML 树映射到字典的字典?

我很高兴你问:

def recursive_dict(element):
     return element.tag, \
            dict(map(recursive_dict, element)) or element.text

但是当我尝试使用它时,我得到以下信息:

>>> r = requests.get('http://localhost:8983/solr/admin/cores?action=STATUS')
>>> xml_dict = recursive_dict(lxml.etree.parse(StringIO.StringIO(r.content)))

AttributeError: 'lxml.etree._ElementTree' object has no attribute 'tag'

我是否缺少将 ElementTree 转换为元素的步骤?

4

2 回答 2

3

lxml.etree.parse返回一个ElementTree对象,而不是一个Element对象。从文档中:

ElementTree 主要是围绕具有根节点的树的文档包装器。它提供了一些用于序列化和一般文档处理的方法。

ElementTree.getroot()返回文档的根元素:

xml_doc = lxml.etree.parse(StringIO.StringIO(r.content))
xml_dict = recursive_dict(xml_doc.getroot())

编辑

这是一个recursive_dict可能更适合的变体:

def recursive_dict(element):
    retval = {}
    retval["tag"] = element.tag
    if element.text:
        retval["text"] = element.text

    if element.tail:
        retval["tail"] = element.tail

    if element.attrib:
        retval["attributes"] = element.attrib

    if len(element) > 0:
        retval["children"] = [recursive_dict(child_element) for child_element in element]

    return retval
于 2013-10-28T20:37:23.147 回答
1

我确实意识到我在这方面晚了大约 7.5 年,但次优的实现仍然在常见问题解答中保持不变,我想在这里分享我的解决方案,因为在寻找有关此问题的答案时它是一个突出的搜索结果,并且有人最终可能会找到它有用。

对于我的用例,我想要一个介于常见问题解答和 codeape 提供的内容之间的版本。此版本允许仅通过标签访问子节点,但如果有多个具有相同标签的子节点,则会有一个字典列表,而不仅仅是最后一个值的字典。如果您需要更多的花里胡哨,也应该很容易适应。

这就是我最终使用的:

def recursive_dict(element):
    """Takes an lxml element and returns a corresponding nested python dictionary.
       If there's multiple child elements with same tag, it will have a list of them.
       Improvement on https://lxml.de/FAQ.html#how-can-i-map-an-xml-tree-into-a-dict-of-dicts"""
    
    # Trivial case returns only the element text.
    if len(element) == 0:
        return element.text
    
    # Nested case returns a proper dictionary.
    else:
        retval = {}
        
        for child in element:
            # Recursive call computed, but not placed yet.
            recurse = recursive_dict(child)
            
            # No previous entry means it's now a single entry.
            if child.tag not in (retval):
                retval[child.tag] = recurse
                
            # Previous single entry means it's now a list.
            elif type(retval[child.tag]) is not list:
                oldval = retval[child.tag]
                retval[child.tag] = [oldval, recurse]
                
            # Previous list entry means the list gets appended.
            else:
                oldlist = retval[child.tag]
                retval[child.tag] = oldlist + [recurse]
                
        return retval
于 2021-05-07T16:51:31.247 回答