我有数百万个 JSON 存储在 Snowflake 的单个变量列表中。它们采用以下格式,但每个 JSON 的行数会有所不同。
请问有人能给我一些关于如何将数据提取到平面表中的指导吗?我是使用 JSON 文件的新手,在不一致的行数和缺少定义对象名称的指标之间让我感到困惑。
这是一个示例 JSON:
{
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AB2 Weight on Bit": 0.2714572,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AB2 Weight on Bit unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD Diff Press Gain SP": 0,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD Diff Press Gain SP unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD ROP": 0,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.AD ROP unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Calculated Pipe Displacement": -999.25,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Calculated Pipe Displacement unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Cumulative Delta Displacement": -999.25,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Cumulative Delta Displacement unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.FD Svy Quality": -999.25,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.FD Svy Quality unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.GWEX SampleFlow": -999.25,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.GWEX SampleFlow unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.MP3_STK": -999.25,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.MP3_STK unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.PT Correction": -999.25,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.PT Correction unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Pit 11 Jumps": -999.25,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Pit 11 Jumps unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.ROP - #1 Ref Time": -999.25,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.ROP - #1 Ref Time unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK2_VOL": 8.732743,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK2_VOL unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK4_VOL": 16.13105,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.TANK4_VOL unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Time On Slip": 1.3,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Time On Slip unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.WPDA - Mud Motor Torque": -999.25,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.WPDA - Mud Motor Torque unit": "",
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Washout Factor": 4.167005,
"Edge 93 Belgium 43-23-19 1932.Time_1_Avg.Washout Factor unit": "",
"DeviceId": "streamingdevice",
"EventEnqueuedUtcTime": "2020-05-04T22:12:21.5310000Z",
"EventProcessedUtcTime": "2020-05-04T22:12:35.6868329Z",
"IoTHub": {
"ConnectionDeviceGenerationId": "637199801617320690",
"ConnectionDeviceId": "streamingdevice",
"CorrelationId": null,
"EnqueuedTime": "2020-05-04T22:12:21.0000000",
"MessageId": null,
"StreamId": null
},
"PartitionId": 1,
"Timestamp": "2019-10-30 13:48:05.000000"
}
“Edge 93 Belgium 43-23-19 1932”是一个对象名称;每个 JSON 用于单个对象。
“Time_1_Avg.AB2 Weight on Bit”是读数类型,本质上由Tag1.Tag2组成。
该行的最后一部分是读数值。
JSON 底部的时间戳是读取时间。
此部分不是必需的:
"DeviceId": "streamingdevice", "EventEnqueuedUtcTime": "2020-05-04T22:12:21.5310000Z", "EventProcessedUtcTime": "2020-05-04T22:12:35.6868329Z", "IoTHub": { "ConnectionDeviceGenerationId": "637199801617320690", "ConnectionDeviceId": "streamingdevice", "CorrelationId": null, "EnqueuedTime": "2020-05-04T22:12:21.0000000", "MessageId": null, "StreamId": null }, "PartitionId": 1,
此数据的理想输出是:
但只是得到这样的东西会非常有帮助: