google-bigquery - 从 BigQuery 中的 json 字符串中提取信息

Question

我在 Bigquery 中存储了一个带有分类算法结果的表。表架构是 INT、STRING，看起来像这样：

ID	输出
1001	{“苹果酒”：0.7，“咖啡”：0.2，“果汁”：0.1}
1002	{'黑咖啡'：0.9，'茶'：0.1}

问题是如何获取每个字符串的第一个（或第二个或任何顺序）元素及其分数。JSON_EXTRACT 似乎不太可能工作，而且很可能可以使用 Javascript 完成。想知道这里的优雅解决方案是什么样的。

score 1 · Accepted Answer

Consider below

select ID, 
  trim(split(kv, ':')[offset(0)], " '") element, 
  cast(split(kv, ':')[offset(1)] as float64) score, 
  element_position
from `project.dataset.table` t,
unnest(regexp_extract_all(trim(Output, '{}'), r"'[^':']+'\s?:\s?[^,]+")) kv with offset as element_position

If applied to sample data in your question - output is

Note: you can use less verbose unnest statement if you wish

unnest(split(trim(Output, '{}'))) kv with offset as element_position

google-bigquery - 从 BigQuery 中的 json 字符串中提取信息

1 回答 1

Related

Reference