0

extension of: recursing a dictionary of lists of dictionaries, etc et al (python)

I'm working with a nested dictionary structure of 4 levels, I'm trying to iterate of the entire nested dictionary and give each individual dictionary an identification number(as a precursor to building a tree of the items and being able tell which item node is parent, which children a node has etc.)

I have this function:

def r(y):
    cnt = 1
    def recurse(y, count):
        for i in y.iteritems():
            count+=1
            i['id'] = count
            for k,v in y.iteritems():
                if isinstance(v, list):
                    [recurse(i, count) for i in v]
                else:
                    pass
    recurse(y, cnt)
    return y

I put in my nested dictionary of lists of dictionaries,

and I get a mess, i.e. doesn't work like I thought it would.

{'sections': [{'id': 11, 'info': 'This is section ONE', 'tag': 's1'},
              {'fields': [{'id': 15,
                           'info': 'This is field ONE',
                           'tag': 'f1'},
                          {'elements': [{'id': 20,
                                         'info': 'This is element',
                                         'tag': 'e1',
                                         'type_of': 'text_field'},
                                        {'id': 20,
                                         'info': 'This is element',
                                         'tag': 'e2',
                                         'type_of': 'text_field'},
                                        {'id': 20,
                                         'info': 'This is element',
                                         'tag': 'e3',
                                         'type_of': 'text_field'},
                                        {'id': 20,
                                         'info': 'This is element',
                                         'tag': 'e4',
                                         'type_of': 'text_field'}],
                           'id': 16,
                           'info': 'This is field TWO',
                           'tag': 'f2'},
                          {'elements': [{'id': 20,
                                         'info': 'This is element',
                                         'tag': 'e5',
                                         'type_of': 'text_field'},
                                        {'id': 20,
                                         'info': 'This is element',
                                         'tag': 'e6',
                                         'type_of': 'text_field'},
                                        {'id': 20,
                                         'info': 'This is element',
                                         'tag': 'e7',
                                         'type_of': 'text_field'},
                                        {'id': 20,
                                         'info': 'This is element ONE',
                                         'tag': 'e8',
                                         'type_of': 'text_field'}],
                           'id': 16,
                           'info': 'This is field THREE',
                           'tag': 'f3'}],
               'id': 12,
               'info': 'This is section TWO',
               'tag': 's2'},
              {'fields': [{'id': 15,
                           'info': 'This is field FOUR',
                           'tag': 'f4'},
                          {'id': 15,
                           'info': 'This is field FIVE',
                           'tag': 'f5'},
                          {'id': 15,
                           'info': 'This is field SIX',
                           'tag': 'f6'}],
               'id': 12,
               'info': 'This is section THREE',
               'tag': 's3'}],
 'tag': 'test'}

What I want to happen is that all items in level one are numbered, then all items in level two are numbered, then the third level, then the fourth. In this case the main item should be given an id of 1, then the sections be identified as 2,3,4 then fields as 5 on, then elements, etc. Looking back on this after sleeping on it I can see it as a start, but quite wrong.

EDIT: What I really need to do is create a tree of parent/child nodes from a nested dictionary structure so that I can iterate/insert/get/work with as needed the items from this tree. Is there a quick way to do that? I seem to be doing more work than I anticipated.

EDIT2: I found a solution to my original question. I just decided to use the in built id() function instead of an extra step of adding an id, and was able to create the minimal tree I needed, but this is still useful an exercise.

4

4 回答 4

0

要考虑的替代方法是双向链表。例如:

Index  Tag     Parent  Children        Info
0      test    -1      [s1,s2,s3]      ""
1      s1      0       []              "This is section ONE"
2      s2      0       [f1,f2,f3]      "This is section TWO"
3      f1      2       []              "This is field ONE"
4      f2      2       [e1,e2,e3,e4]   "This is field TWO"
5      e1      4       []              "This is element"
6      e2      4       []              "This is element"
       .
       .
       .

这是一个概念表示,实际实现将使用子列的数字行索引而不是标签,因为您的输入数据可能是脏的,带有重复或缺失的标签,并且您不想构建依赖于标签的结构是独一无二的。可以轻松添加其他列。

您可以通过递归遍历树来构建表,但通过使用平面表(列表的 2D 列表)中的行来引用它们可能更容易处理树中的项目。

编辑:这是您对原始问题(未修饰的节点列表)的解决方案的扩展,它将结构化信息(标签、父级、子级等)添加到每个节点。如果您需要在树上上下导航,这可能很有用。

编辑:这段代码:

def recurse(y, n=[], p=-1):
    node = ["", p, [], "", ""]   # tag, parent, children, type, info
    vv = []
    for k,v in y.items():
        if k == "tag":
            node[0] = v
        elif k == "info":
            node[4] = v
        elif isinstance(v, list):
            node[3] = k
            vv = v
    n.append(node)
    p = len(n)-1
    for i in vv:
        n[p][2].append(len(n))
        n = recurse(i, n, p)
    return(n)

nodes = recurse(a)
for i in range(len(nodes)):
    print(i, nodes[i])

产生(为了便于阅读,手动分隔成列):

 0 ['test', -1, [1, 2, 14],     'sections',   '']
 1 [  's1',  0, [],             '',           'This is section ONE']
 2 [  's2',  0, [3, 4, 9],      'fields',     'This is section TWO']
 3 [  'f1',  2, [],             '',           'This is field ONE']
 4 [  'f2',  2, [5, 6, 7, 8],   'elements',   'This is field TWO']
 5 [  'e1',  4, [],             '',           'This is element']
 6 [  'e2',  4, [],             '',           'This is element']
 7 [  'e3',  4, [],             '',           'This is element']
 8 [  'e4',  4, [],             '',           'This is element']
 9 [  'f3',  2, [10, 11, 12, 13], 'elements', 'This is field THREE']
10 [  'e5',  9, [],             '',           'This is element']
11 [  'e6',  9, [],             '',           'This is element']
12 [  'e7',  9, [],             '',           'This is element']
13 [  'e8',  9, [],             '',           'This is element ONE']
14 [  's3',  0, [15, 16, 17],   'fields',     'This is section THREE']
15 [  'f4', 14, [],             '',           'This is field FOUR']
16 [  'f5', 14, [],             '',           'This is field FIVE']
17 [  'f6', 14, [],             '',           'This is field SIX']
于 2012-09-26T17:13:57.773 回答
0

您会得到重复的 id,因为您的count变量是本地的,一旦recurse函数退出,对它的任何更改都会丢失。你可以通过声明一个全局变量来绕过它,但由于你没有使用 的返回值recurse,你可以使用它来代替:

def r(y):
    def recurse(y, count):
        y['id'] = count
        count += 1
        for k,v in y.iteritems():
            if isinstance(v, list):
                for i in v:
                    count = recurse(i, count)
        return count
    recurse(y, 1)
    return y

编辑:刚刚意识到你正在寻找一个广度优先的 id 分配......这不会实现这一点,但我会留下答案,因为它可能有助于你开始。

于 2012-09-26T13:26:53.600 回答
0

这是一个用迭代器替换count变量的解决方案:itertools.count

from itertools import count
def r(y):
    counter = count()
    def recurse(y, counter):
        for i in y.iteritems():
            i['id'] = next(counter)
            for k,v in y.iteritems():
                if isinstance(v, list):
                    [recurse(i, counter) for i in v]
                else:
                    pass
    recurse(y, counter)
    return y

itertools.count() 将创建一个生成器,每次调用 next() 时都会返回下一个整数。您可以将它传递给递归函数,并确保不会创建重复的 id。

于 2012-09-26T16:38:16.310 回答
0

好吧,我有一个使用深度和父级来设置 ID 的解决方案:

>>> def decorate_tree(tree, parent=None, index=None):
    global ID
    if type(tree) == type({}):
        if parent is None:
            parent = '1'
            tree['id'] = parent
        else:
            tree['id'] = '{0}.{1}'.format(parent, index)
        if 'info' in tree:
            print tree['info'], '=>', tree['id']
        child_index = 1
        for key in tree:
            if type(tree[key]) == type([]):
                for item in tree[key]:
                    decorate_tree(item, tree['id'], child_index)
                    child_index += 1


>>> decorate_tree(d)
This is section ONE => 1.1
This is section TWO => 1.2
This is field ONE => 1.2.1
This is field TWO => 1.2.2
This is element => 1.2.2.1
This is element => 1.2.2.2
This is element => 1.2.2.3
This is element => 1.2.2.4
This is field THREE => 1.2.3
This is element => 1.2.3.1
This is element => 1.2.3.2
This is element => 1.2.3.3
This is element ONE => 1.2.3.4
This is section THREE => 1.3
This is field FOUR => 1.3.1
This is field FIVE => 1.3.2
This is field SIX => 1.3.3
>>> from pprint import pprint
>>> pprint(d)
{'id': '1',
 'sections': [{'id': '1.1', 'info': 'This is section ONE', 'tag': 's1'},
              {'fields': [{'id': '1.2.1',
                           'info': 'This is field ONE',
                           'tag': 'f1'},
                          {'elements': [{'id': '1.2.2.1',
                                         'info': 'This is element',
                                         'tag': 'e1',
                                         'type_of': 'text_field'},
                                        {'id': '1.2.2.2',
                                         'info': 'This is element',
                                         'tag': 'e2',
                                         'type_of': 'text_field'},
                                        {'id': '1.2.2.3',
                                         'info': 'This is element',
                                         'tag': 'e3',
                                         'type_of': 'text_field'},
                                        {'id': '1.2.2.4',
                                         'info': 'This is element',
                                         'tag': 'e4',
                                         'type_of': 'text_field'}],
                           'id': '1.2.2',
                           'info': 'This is field TWO',
                           'tag': 'f2'},
                          {'elements': [{'id': '1.2.3.1',
                                         'info': 'This is element',
                                         'tag': 'e5',
                                         'type_of': 'text_field'},
                                        {'id': '1.2.3.2',
                                         'info': 'This is element',
                                         'tag': 'e6',
                                         'type_of': 'text_field'},
                                        {'id': '1.2.3.3',
                                         'info': 'This is element',
                                         'tag': 'e7',
                                         'type_of': 'text_field'},
                                        {'id': '1.2.3.4',
                                         'info': 'This is element ONE',
                                         'tag': 'e8',
                                         'type_of': 'text_field'}],
                           'id': '1.2.3',
                           'info': 'This is field THREE',
                           'tag': 'f3'}],
               'id': '1.2',
               'info': 'This is section TWO',
               'tag': 's2'},
              {'fields': [{'id': '1.3.1',
                           'info': 'This is field FOUR',
                           'tag': 'f4'},
                          {'id': '1.3.2',
                           'info': 'This is field FIVE',
                           'tag': 'f5'},
                          {'id': '1.3.3',
                           'info': 'This is field SIX',
                           'tag': 'f6'}],
               'id': '1.3',
               'info': 'This is section THREE',
               'tag': 's3'}],
 'tag': 'test',
 'type_of': 'custom'}
>>> 

所以 ID 1.3.4 的父级是 ID 1.3,兄弟级是 ID 1.3.x,子级是 1.3.4.x ......这样检索和插入应该不会太难(移位索引)。

于 2012-09-26T15:51:46.740 回答