1

我正在尝试加载一个 JSON 文件message_1.json,该文件的数据结构如下所示:

{
  "participants": [
    {
      "name": "X"
    },
    {
      "name": "Y"
    }
  ],
  "messages": [
    {
      "sender_name": "X",
      "timestamp_ms": 1608148658236,
      "content": "Hi",
      "type": "Generic"
    },
    {
      "sender_name": "Y",
      "timestamp_ms": 1608148604578,
      "content": "Hey",
      "type": "Generic"
    },
    {
      "sender_name": "X",
      "timestamp_ms": 1608148599875,
      "content": "Bye",
      "type": "Generic"
    },

    ...
    ...
    ...

我已经使用df = pd.read_json("message_1.json")ValueError: arrays must all be same length.

我相信这里也有人问过类似的问题,我尝试使用指定的解决方案

with open('message_1.json') as json_data:
    data = json.load(json_data) # Data is successfully loaded here in the above format

pd.DataFrame.from_dict(data, orient='timestamp_ms').T.set_index('timestamp_ms') 

由于数据中没有index列,因此我将索引设置为timestamp_ms导致错误ValueError: only recognize index or columns for orient

我想我已经以错误的方向加载了 JSON。

请指教。

4

1 回答 1

1

从 json-input 来看,您似乎只想要保留“消息”部分。

import json
import pandas as pd

with open('message_1.json') as json_data:
    data = json.load(json_data)

messages = data['messages']

# messages is a list, not a dict
df = pd.DataFrame(messages)

df.set_index('timestamp_ms', inplace=True)

输出:

              sender_name content     type
timestamp_ms                              
1608148658236           X      Hi  Generic
1608148604578           Y     Hey  Generic
1608148599875           X     Bye  Generic
于 2020-12-17T08:05:40.603 回答