1

我有一个像

 {
        "id": 3590403096656,
        "title": "Romania Special Zip Hoodie Blue - Version 02 A5",
        "tags": [
            "1ST THE WORLD FOR YOU <3",
            "apparel",
        ],
        "props": [
            {
                "id": 28310659235920,
                "title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            },
            {
                "id": 444444444444,
                "title": "number 2",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            }
        ]
}

我想把它弄平,所以想要的输出看起来像

{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 28310659235920,"props.title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00",       "props.updated_at": "2019-05-22T01:03:29+07:00"}
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 444444444444,"props.title": "number 2","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00","props.updated_at": "2019-05-22T01:03:29+07:00"}

到目前为止,我已经尝试过:

from pandas.io.json import json_normalize
json_normalize(sample_object)

其中sample_object包含json对象,我正在循环通过我想以所需格式展平的此类对象的大文件。

json_normalize没有给我想要的输出,我想保持标签原样,但展平props并重复父对象信息。

4

2 回答 2

2

你想要一些json_normalize行为,但有一个自定义的扭曲。因此json_normalize,在部分数据上使用或类似,然后将其与其余数据结合起来。

下面的代码更喜欢“或类似”的路线,深入到 pandas 代码库以获取nested_to_record帮助函数,这会使字典变平。它用于创建单独的行,将基础数据(所有属性中通用的键/值)与特定于每个 props 条目的扁平化数据结合起来。有一条注释掉的行在没有 的情况下执行相同的操作nested_to_record,但它有点不雅地扁平化为 a DataFrame,然后导出为 a dict

from collections import OrderedDict
import json
import pandas as pd
from pandas.io.json.normalize import nested_to_record

data = json.loads(rawjson)
props = data.pop('props')
rows = []
for prop in props:
    rowdict = OrderedDict(data)
    flattened_prop = nested_to_record({'props': prop})
    # flatteded_prop = json_normalize({'props': prop}).to_dict(orient='records')[0]
    rowdict.update(flattened_prop)
    rows.append(rowdict)

df = pd.DataFrame(rows)

导致:

输出数据帧

于 2019-06-22T17:49:00.033 回答
1

请试试这个:

import copy

obj =  {
        "id": 3590403096656,
        "title": "Romania Special Zip Hoodie Blue - Version 02 A5",
        "tags": [
            "1ST THE WORLD FOR YOU <3",
            "apparel",
        ],
        "props": [
            {
                "id": 28310659235920,
                "title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            },
            {
                "id": 444444444444,
                "title": "number 2",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            }
        ]
}

props = obj.pop("props")

for p in props:
    res = copy.deepcopy(obj)
    for k in p:
        res["props."+k] = p[k]
    print(res)

基本上它用于pop("props")获取 obj 没有"props"(这是在所有结果对象中使用的通用部分),

然后我们遍历道具,并创建包含基础对象的新对象,然后为每个道具中的每个键填充“props.key”。

于 2019-06-22T15:19:06.787 回答