[更新:似乎下面的问题仅影响 Databricks 10.x 与文字视图爆炸(展开数组),查询在 Databricks 9.x 中按预期工作]
从 JSON 中查询 df 作为结构时,会显示以下错误:
Error running query
Ambiguous reference to fields StructField(value,StringType,true), StructField(value,StringType,true)
这是因为我正在查询两个具有相同名称(但路径不同)的字段
select
postlist.string_map_data.Likes.value as PostLikeCount,
postlist.string_map_data.Comments.value as PostCommentCount
from newschema.insta_post_insights
lateral view explode(organic_insights_posts) AS postlist
因此,两个“值”重复导致此错误。我可以产生类似的任何 Spark SQL 查询方法
PostLikeCount | PostCommentCount |
---|---|
1 | 3 |
2 | 4 |
重命名源 df 可能效果最好,但这会影响 ETL 过程,在这种情况下并不理想。感谢有人可以提供帮助。JSON 数据是典型的 Instagram 数据导出,如下所示
"organic_insights_posts": [
{"string_map_data": {
"Creation timestamp": {
"href": "",
"value": "",
"timestamp": 123456789
},
"Profile visits": {
"href": "",
"value": "--",
"timestamp": 0
},
"Impressions": {
"href": "",
"value": "--",
"timestamp": 0
},
"Follows": {
"href": "",
"value": "--",
"timestamp": 0
},
"Accounts reached": {
"href": "",
"value": "--",
"timestamp": 0
},
"Saves": {
"href": "",
"value": "1",
"timestamp": 0
},
"Likes": {
"href": "",
"value": "1",
"timestamp": 0
},
"Comments": {
"href": "",
"value": "1",
"timestamp": 0
}