1

假设我想使用JavaScript UDF对具有嵌套结构的表(例如示例 Github 提交)进行一些处理。我可能想在迭代 UDF 实现时更改我在 UDF 中查看的字段,因此我决定只将表中的整行传递给它。我的 UDF 最终看起来像这样:

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(
  input STRUCT<commit STRING, tree STRING, parent ARRAY<STRING>,
               author STRUCT<name STRING, email STRING, ...>>)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
[UDF content here]
""";

然后我使用如下查询调用该函数:

SELECT GetCommitStats(t).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;

UDF 声明中最麻烦的部分是输入结构,因为我必须包含所有嵌套字段及其类型。有一个更好的方法吗?

4

1 回答 1

6

您可以使用TO_JSON_STRING将任意结构和数组转换为 JSON,然后在 UDF 中将其解析为对象以供进一步处理。例如,

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";

SELECT GetCommitStats(TO_JSON_STRING(t)).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;

如果要减少扫描的列数,可以将相关列的结构传递给TO_JSON_STRING

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";

SELECT
  GetCommitStats(TO_JSON_STRING(
    STRUCT(parent, author, difference)
  )).*
FROM `bigquery-public-data.github_repos.sample_commits`;
于 2017-05-17T18:06:32.647 回答