1

我正在尝试使用 dataprep 来整理我的数据以进行报告。

但是,应该是数组的字段在 bigquery 中被识别为字符串。

示例数据:

{"name":"herman","age":34,"property":[{"address":"henry street","state":"vic"},{"address":"mount waverley","state":"vic"}]}
{"name":"Handry","age":61,"property":[{"address":"Balwyn","state":"vic"},{"address":"Clayton","state":"vic"}]}

基本上,我只想将某些字段设为大写。(这只是一个简单的示例,我有非常复杂的转换)这是 wrangle 文件:

flatten col: property
unnest col: property keys: 'state','address' markLineage: true
drop col: property
derive value: upper(property_address) as: 'upper_property_address'
drop col: property_address
derive value: upper(property_state) as: 'upper_property_state'
drop col: property_state
nest col: upper_property_state,upper_property_address as: 'column1'
derive value: list(column1, 1000) group: name as: 'column2'
drop col: column1,upper_property_address,upper_property_state
deduplicate

最后导致 bigquery(在表中):

{"age":"61","name":"Handry","column2":"[{\"upper_property_state\":\"VIC\",\"upper_property_address\":\"BALWYN\"},{\"upper_property_state\":\"VIC\",\"upper_property_address\":\"CLAYTON\"}]"}
{"age":"34","name":"herman","column2":"[{\"upper_property_state\":\"VIC\",\"upper_property_address\":\"HENRY STREET\"},{\"upper_property_state\":\"VIC\",\"upper_property_address\":\"MOUNT WAVERLEY\"}]"}

我也在谷歌问题中问过这个问题。 https://issuetracker.google.com/issues/69773118

只是想知道,有人有这个问题并有解决方法吗? 我知道我们可以在 biquery 中查询 JSON,如下所述: How to query json stored as string in bigquery table?

但它使查询变得复杂,我想避免它。

4

0 回答 0