1

[更新:似乎下面的问题仅影响 Databricks 10.x 与文字视图爆炸(展开数组),查询在 Databricks 9.x 中按预期工作]

从 JSON 中查询 df 作为结构时,会显示以下错误:

Error running query
Ambiguous reference to fields StructField(value,StringType,true), StructField(value,StringType,true)

这是因为我正在查询两个具有相同名称(但路径不同)的字段

select
postlist.string_map_data.Likes.value as PostLikeCount,
postlist.string_map_data.Comments.value as PostCommentCount
from newschema.insta_post_insights
    lateral view explode(organic_insights_posts) AS postlist

因此,两个“值”重复导致此错误。我可以产生类似的任何 Spark SQL 查询方法

PostLikeCount PostCommentCount
1 3
2 4

重命名源 df 可能效果最好,但这会影响 ETL 过程,在这种情况下并不理想。感谢有人可以提供帮助。JSON 数据是典型的 Instagram 数据导出,如下所示

"organic_insights_posts": [
    {"string_map_data": {
        "Creation timestamp": {
          "href": "",
          "value": "",
          "timestamp": 123456789
        },
        "Profile visits": {
          "href": "",
          "value": "--",
          "timestamp": 0
        },
        "Impressions": {
          "href": "",
          "value": "--",
          "timestamp": 0
        },
        "Follows": {
          "href": "",
          "value": "--",
          "timestamp": 0
        },
        "Accounts reached": {
          "href": "",
          "value": "--",
          "timestamp": 0
        },
        "Saves": {
          "href": "",
          "value": "1",
          "timestamp": 0
        },
        "Likes": {
          "href": "",
          "value": "1",
          "timestamp": 0
        },
        "Comments": {
          "href": "",
          "value": "1",
          "timestamp": 0
        }
4

0 回答 0