1

想象一下,我点击了一个 API,它返回了一个多级 json blob。然后我想从该 blob 中提取特定值,然后将其上传到数据库,因此我需要将其展平。

基本上我想摆脱这样的事情:

d1 = {'results': [
        {'a': 1, 'b': 10},
        {'a': 2, 'b': 20},
        {'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
        {'a': 4, 'c': {'d': 200, 'e': 2000}}
    ]
}

像这样(理想情况下,标签调整为代表原始层次结构):

d2 = [
    {'a': 1, 'b': 10},
    {'a': 2, 'b': 20},
    {'a': 3, 'b': 30, 'c.d': 100},
    {'a': 4, 'c.d': 200}
]

我觉得 jsonpath 或 objectpaths 应该能够做到这一点,但我无法让它工作。我可以很容易地遍历这个例子,但是我有一堆这样的东西要做,所以更“声明性”的东西会更可取。

我一定错过了这些路径的工作原理。这是我的尝试:

from objectpath import Tree

# starting here...
d1 = {'results': [
        {'a': 1, 'b': 10},
        {'a': 2, 'b': 20},
        {'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
        {'a': 4, 'c': {'d': 200, 'e': 2000}}
    ]
}

# trying to get here...
# d2 = [
#     {'a': 1, 'b': 10},
#     {'a': 2, 'b': 20},
#     {'a': 3, 'b': 30, 'c.d': 100},
#     {'a': 4, 'c.d': 200}
# ]

if __name__ == "__main__":
    t = Tree(d1)
    print([x for x in t.execute('$.results.a')])  # works to get value of a
    print([x for x in t.execute('$.results.(a,b)')])  # creates dictionary of a & b -- cool
    print([x for x in t.execute('$.results.(a,b,c)')])  # adds all of c's sub document, makes sense
    print([x for x in t.execute('$.results.(a,b,c.d)')])  # nothing changed?
    print([x for x in t.execute('$.results.*')])  # selects everything, sure
    print([x for x in t.execute('$.results.*["a"]')])  # just "a" value again, makes sense
    print([x for x in t.execute('$.results.*["a" or "b"]')])  # apparently this means HAS "A" or "B" -- weird?
    print([x for x in t.execute('$.results..(a,b,d)')])  # almost works but puts d in it's own dictionary?!
    print([x for x in t.execute('{"a": $.results.a, "b": $.results.b, "c.d":  $.results.c.d}')])  # what I would expect, but not even close

结果

[1, 2, 3, 4]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'a': 3}, {'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[1, 2, 3, 4]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'c': {'d': 100, 'e': 1000}, 'a': 3}, {'c': {'d': 200, 'e': 2000}, 'a': 4}]
[{'b': 10, 'a': 1}, {'b': 20, 'a': 2}, {'b': 30, 'a': 3}, {'d': 100}, {'a': 4}, {'d': 200}]
['b', 'a', 'c.d']

我似乎很接近,但也许我这样做完全是错误的方式?像棉花糖这样的东西会更好吗?这似乎有点矫枉过正,因为我必须定义一个类层次结构。谢谢!

4

1 回答 1

0

这是简单的递归:

from pprint import pprint


def flat_dict(d: dict):
    o = {}
    for k, v in d.items():
        if type(v) is dict:
            o.update({
                k + '.' + key: value
                for key, value in flat_dict(v).items()
            })
        else:
            o[k] = v
    return o


def main():
    d = {
        'result': [
            {'a': 1, 'b': 10},
            {'a': 2, 'b': 20},
            {'a': 3, 'b': 30, 'c': {'d': 100, 'e': 1000}},
            {'a': 4, 'c': {'d': 200, 'e': 2000}}
        ]
    }

    res = [
        flat_dict(e)
        for e in d['result']
    ]
    pprint(res)


if __name__ == '__main__':
    main()

结果:

[{'a': 1, 'b': 10},
 {'a': 2, 'b': 20},
 {'a': 3, 'b': 30, 'c.d': 100, 'c.e': 1000},
 {'a': 4, 'c.d': 200, 'c.e': 2000}]
于 2018-01-04T23:20:37.430 回答