2

(最初来自上一个问题,但为更一般的问题重新构建)

这是我正在使用 2 条记录的示例 json 文件:

[{"Time":"2016-01-10",
"ID"
:13567,
"Content":{
    "Event":"UPDATE",
    "Id":{"EventID":"ABCDEFG"},
    "Story":[{
        "@ContentCat":"News",
        "Body":"Related Meeting Memo: Engagement with target firm for potential M&A.  Please be on call this weekend for news updates.",
        "BodyTextType":"PLAIN_TEXT",
        "DerivedId":{"Entity":[{"Id":"Amy","Score":70}, {"Id":"Jon","Score":70}]},
        "DerivedTopics":{"Topics":[
                            {"Id":"Meeting","Score":70},
                            {"Id":"Performance","Score":70},
                            {"Id":"Engagement","Score":100},
                            {"Id":"Salary","Score":70},
                            {"Id":"Career","Score":100}]
                        },
        "HotLevel":0,
        "LanguageString":"ENGLISH",
        "Metadata":{"ClassNum":50,
                    "Headline":"Attn: Weekend",
                    "WireId":2035,
                    "WireName":"IIS"},
        "Version":"Original"}
                ]},
"yyyymmdd":"20160110",
"month":201601},
{"Time":"2016-01-12",
"ID":13568,
"Content":{
    "Event":"DEAL",
    "Id":{"EventID":"ABCDEFG2"},
    "Story":[{
        "@ContentCat":"Details",
        "Body":"Test email contents",
        "BodyTextType":"PLAIN_TEXT",
        "DerivedId":{"Entity":[{"Id":"Bob","Score":100}, {"Id":"Jon","Score":70}, {"Id":"Jack","Score":60}]},
        "DerivedTopics":{"Topics":[
                            {"Id":"Meeting","Score":70},
                            {"Id":"Engagement","Score":100},
                            {"Id":"Salary","Score":70},
                            {"Id":"Career","Score":100}]
                        },
        "HotLevel":0,
        "LanguageString":"ENGLISH",
        "Metadata":{"ClassNum":70,
                    "Headline":"Attn: Weekend",
                    "WireId":2037,
                    "WireName":"IIS"},
        "Version":"Original"}
                ]},
"yyyymmdd":"20160112",
"month":201602}]

我正在尝试获取实体 ID 级别的数据框(从记录 1中提取Amy和,从记录 2中提取)。我该怎么做呢?为了澄清,级别是(内容 > 故事 > DerivedID > 实体 > Id)JonBobJonJack

4

2 回答 2

2

使用list comprehension,您可以深入了解该结构,例如:

with open('test.json', 'rU') as f:
    data = json.load(f)

df = pd.DataFrame(sum([i['Content']['Story'][0]['DerivedId']['Entity']
                       for i in data], []))

print(df)

或者,如果您有大量数据并且不想进行笨重的sum()使用itertools.chain.from_iterable,例如:

import itertools as it
df = pd.DataFrame.from_records(it.chain.from_iterable(
    i['Content']['Story'][0]['DerivedId']['Entity'] for i in data))

结果:

     Id  Score
0   Amy     70
1   Jon     70
2   Bob    100
3   Jon     70
4  Jack     60
于 2018-07-08T22:58:33.190 回答
0
df = pd.json_normalize(data, ['Content', 'Story', 'DerivedId', 'Entity'])
print(df)

请记住,最终根必须是您的 json 中的列表

结果:

     Id  Score
0   Amy     70
1   Jon     70
2   Bob    100
3   Jon     70
4  Jack     60

如果你只想要 id

df[['Id']]
于 2022-02-12T04:00:22.303 回答