我在数据湖中有这个 JSON 文件,如下所示:
{
"id":"398507",
"contenttype":"POST",
"posttype":"post",
"uri":"http://twitter.com/etc",
"title":null,
"profile":{
"@class":"PublisherV2_0",
"name":"Company",
"id":"2163171",
"profileIcon":"https://pbs.twimg.com/image",
"profileLocation":{
"@class":"DocumentLocation",
"locality":"Toronto",
"adminDistrict":"ON",
"countryRegion":"Canada",
"coordinates":{
"latitude":43.7217,
"longitude":-31.432},
"quadKey":"000000000000000"},
"displayName":"Name",
"externalId":"00000000000"},
"source":{
"name":"blogs",
"id":"18",
"param":"Twitter"},
"content":{
"text":"Description of post"},
"language":{
"name":"English",
"code":"en"},
"abstracttext":"More Text and links",
"score":{}
}
}
为了将数据调用到我的应用程序中,我必须使用以下代码将 JSON 转换为字符串:
DECLARE @input string = @"/MSEStream/{*}.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
@allposts =
EXTRACT
jsonString string
FROM @input
USING Extractors.Text(delimiter:'\b', quoting:true);
@extractedrows = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS er FROM @allposts;
@result =
SELECT er["id"] AS postID,
er["contenttype"] AS contentType,
er["posttype"] AS postType,
er["uri"] AS uri,
er["title"] AS Title,
er["acquisitiondate"] AS acquisitionDate,
er["modificationdate"] AS modificationDate,
er["publicationdate"] AS publicationDate,
er["profile"] AS profile
FROM @extractedrows;
OUTPUT @result
TO "/ProcessedQueries/all_posts.csv"
USING Outputters.Csv();
这会将 JSON 输出到一个可读的 .csv 文件中,当我下载文件时,所有数据都会正确显示。我的问题是当我需要获取配置文件中的数据时。因为 JSON 现在是一个字符串,所以我似乎无法提取任何数据并将其放入变量中以供使用。有没有办法做到这一点?还是我需要研究其他选项来读取数据?