0

我正在处理一个包含嵌套字段(数组)的 JSON 文件。我正在尝试将其转换为 Pandas 数据框。

{
    "_id": "2026",
    "dataDate": 1537920000000,
    "dataYear": 2018,
    "groupId": "1378",
    "HourConsumed": 19781.4,
    "HourGenerated": 0,
    "max": 4658.400000000001,
    "maxGen": 0,
    "maxTime": 1538001000000,
    "avg": -206.05625,
    "max": 0,
    "maxGen": 0,
    "maxTime": null,
    "avgTemp": 0,
    "me_Id": "2004506_3166155129",
    "interval": 15,
    "intervalMetaData": [
        "whC",
        "whG",
        "max",
        "maxGen",
        "hC",
        "hG",
        "maxVar",
        "maxGen",
        "avgTemp",
        "eventTime"
    ],
    "intervalData": [
        [
            175.2,
            0,
            700.8,
            0,
            0,
            0,
            0,
            0,
            0,
            1537920900000
        ],
        [
            192,
            0,
            768,
            0,
            0,
            0,
            0,
            0,
            0,
            1537921800000
        ],
        [
            191.39999999999998,
            0,
            765.5999999999999,
            0,
            0,
            0,
            0,
            0,
            0,
            1537922700000
        ]
    ]
}

我需要为里面的内容创建单独的列intervalMetaData,然后用来自的值填充这些列intervalData。可能吗?

4

2 回答 2

1

你敢打赌这是可能的!就这么简单:

df = pd.DataFrame(j['intervalData'], columns=j['intervalMetaData'])
于 2021-11-08T14:28:57.427 回答
1

如果我理解正确,您只需通过使用熊猫导入列表来正确设置列:

import pandas as pd

data = {
    "_id": "2026",
    "dataDate": 1537920000000,
    "dataYear": 2018,
    "groupId": "1378",
    "HourConsumed": 19781.4,
    "HourGenerated": 0,
    "max": 4658.400000000001,
    "maxGen": 0,
    "maxTime": 1538001000000,
    "avg": -206.05625,
    "max": 0,
    "maxGen": 0,
    "maxTime": None,
    "avgTemp": 0,
    "me_Id": "2004506_3166155129",
    "interval": 15,
    "intervalMetaData": [
        "whC",
        "whG",
        "max",
        "maxGen",
        "hC",
        "hG",
        "maxVar",
        "maxGen",
        "avgTemp",
        "eventTime"
    ],
    "intervalData": [
        [
            175.2,
            0,
            700.8,
            0,
            0,
            0,
            0,
            0,
            0,
            1537920900000
        ],
        [
            192,
            0,
            768,
            0,
            0,
            0,
            0,
            0,
            0,
            1537921800000
        ],
        [
            191.39999999999998,
            0,
            765.5999999999999,
            0,
            0,
            0,
            0,
            0,
            0,
            1537922700000
        ]
    ]
}


df = pd.DataFrame(data["intervalData"], columns=data["intervalMetaData"])
print(df)

输出:

     whC  whG    max  maxGen  hC  hG  maxVar  maxGen  avgTemp      eventTime
0  175.2    0  700.8       0   0   0       0       0        0  1537920900000
1  192.0    0  768.0       0   0   0       0       0        0  1537921800000
2  191.4    0  765.6       0   0   0       0       0        0  1537922700000

编辑:您可以将其他键添加为带有循环的列:

for k,v in data.items():
    if k not in ["intervalData", "intervalMetaData"]:
        df[k] = v
于 2021-11-08T14:31:04.963 回答