json - 带有记录列表的 Python glom 将常见的唯一 client_id 组合在一起作为键

Question

我刚刚发现了glom，并且该教程很有意义，但是我无法确定用于chrome BrowserHistory.json 条目的正确规范来创建按client_id 分组的数据结构，或者这是否是glom 的正确用法。我想我可以通过循环 json 使用其他方法来完成此操作，但希望了解更多关于 glom 及其功能的信息。

json 具有 Browser_History 和每个历史条目的列表，如下所示：

{
    "Browser_History": [
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com",
            "client_id": "abcd1234",
            "time_usec": 1424794867875291
},
...

我想要一个数据结构，其中所有内容都按 client_id 分组，例如将 client_id 作为 dicts 列表的键，例如：

{ 'client_ids' : {
                'abcd1234' : [ {
                                 "title" : "Google Takeout",
                                 "url"   : "https://takeout.google.com",
                                 ...
                             },
                             ...
                             ],
                'wxyz9876' : [ {
                                 "title" : "Google",
                                 "url"   : "https://www.google.com",
                                 ...
                             },
                             ...
              }
}

这是glom适合的东西吗？我一直在玩它并阅读，但我似乎无法正确地完成我需要的规范。最好的我没有错误的是：

with open(history_json) as f:
    history_list = json.load(f)['Browser_History']

spec = {
    'client_ids' : ['client_id']
}
pprint(glom(data, spec))

这让我得到了所有 client_ids 的列表，但我不知道如何将它们作为键组合在一起，而不是将它们作为一个大列表。任何帮助将不胜感激，谢谢！

score 0 · Accepted Answer

这应该可以解决问题，尽管我不确定这是否是实现这一目标的最“grom”-ic 方式。

import glom

grouping_key = "client_ids"

def group_combine (existing,incoming):
    # existing is a dictionary used for accumulating the data
    # incoming is each item in the list (your input)
    if incoming[grouping_key] not in existing:
        existing[incoming[grouping_key]] = []
    if grouping_key in incoming:
        existing[incoming[grouping_key]].append(incoming)
    
    return existing


data ={ 'Browser_History': [{}] } # your data structure

fold_spec = glom.Fold(glom.T,init = dict, op = group_combine )
results = glom.glom(data["Browser_History"] ,{ grouping_key:fold_spec })

json - 带有记录列表的 Python glom 将常见的唯一 client_id 组合在一起作为键

1 回答 1

Related

Reference