我通过流式分析和使用捕获的事件中心在我的 Data Lake Store 中收到了一个 Avro 文件。
该文件的结构如下所示:
[{"id":1,"pid":"abc","value":"1","utctimestamp":1537805867},{"id":6569,"pid":"1E014000","value": "-5.8","utctimestamp":1537805867}] [{"id":2,"pid":"cde","value":"77","utctimestamp":1537772095},{"id":6658, "pid":"02002001","value":"77","utctimestamp":1537772095}]
我用过这个脚本:
@rs =
EXTRACT
SequenceNumber long,
Offset string,
EnqueuedTimeUtc string,
Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
{
""type"": ""record"",
""name"": ""EventData"",
""namespace"": ""Microsoft.ServiceBus.Messaging"",
""fields"": [
{
""name"": ""SequenceNumber"",
""type"": ""long""
},
{
""name"": ""Offset"",
""type"": ""string""
},
{
""name"": ""EnqueuedTimeUtc"",
""type"": ""string""
},
{
""name"": ""SystemProperties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes""
]
}
},
{
""name"": ""Properties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes"",
""null""
]
}
},
{
""name"": ""Body"",
""type"": [
""null"",
""bytes""
]
}
]
}
");
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message FROM @rs;
@cnt = SELECT message["id"] AS id,
message["id2"] AS pid,
message["value"] AS value,
message["utctimestamp"] AS utctimestamp,
message["extra"] AS extra
FROM @jsonify;
OUTPUT @cnt TO @output_file USING Outputters.Text(quoting: false);
该脚本会生成一个文件,但其中仅包含分隔逗号且没有值。
如何提取/转换此结构,以便可以将其输出为扁平的 4 列 csv 文件?