0

我正在尝试查询一个使用基本重复字段来存储数据的表,如下所示:

+---+----------+------------+
| i | data.key | data.value |
+---+----------+------------+
| 0 | a        |          1 |
|   | b        |          2 |
| 1 | a        |          3 |
|   | b        |          4 |
| 2 | a        |          5 |
|   | b        |          6 |
| 3 | a        |          7 |
|   | b        |          8 |
+---+----------+------------+

我试图弄清楚如何运行一个得到类似结果的查询

+---+----+----+
| i | a  | b  |
+---+----+----+
| 1 |  4 |  6 |
| 3 | 12 | 14 |
+---+----+----+

其中每一行代表一个不重叠的总和(即i=1是行i=0和的总和i=1),并且数据已经过旋转,data.key现在是一列。

问题1:

我尽我所能将此答案转换为使用标准 SQL 并最终得到:

SELECT
    i,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b`
  FROM
    `dataset.testing.dummy`)

这行得通,但我想知道是否有更好的方法来做到这一点,特别是因为它在尝试使用分析函数时会产生特别冗长的查询:

SELECT
  i,
  SUM(a) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `a`,
  SUM(b) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `b`
FROM (
  SELECT
    i,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`,
    (SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b`
  FROM
    `dataset.testing.dummy`)
ORDER BY
  i;

问题2:

如何编写ROWorRANGE语句以使生成的窗口不重叠。在最后一个查询中,我得到了数据的滚动总和,这并不是我想要做的。

+---+----+----+
| i | a  | b  |
+---+----+----+
| 0 |  1 |  2 |
| 1 |  4 |  6 |
| 2 |  8 | 10 |
| 3 | 12 | 14 |
+---+----+----+

滚动总和为每一行产生一个结果,而我试图减少返回的行数。

4

1 回答 1

1

使用临时 SQL 函数和命名窗口有助于减少冗长。不过,我不得不使用另一个子选择来应用过滤器i。这是一个独立的示例:

#standardSQL
CREATE TEMP FUNCTION SumKey(
    data ARRAY<STRUCT<key STRING, value INT64>>,
    target_key STRING) AS (
  (SELECT SUM(value) FROM UNNEST(data) WHERE key = target_key) 
);

WITH Input AS (
  SELECT
    0 AS i,
    ARRAY<STRUCT<key STRING, value INT64>>[('a', 1), ('b', 2)] AS data UNION ALL
  SELECT 1, ARRAY<STRUCT<key STRING, value INT64>>[('a', 3), ('b', 4)] UNION ALL
  SELECT 2, ARRAY<STRUCT<key STRING, value INT64>>[('a', 5), ('b', 6)] UNION ALL
  SELECT 3, ARRAY<STRUCT<key STRING, value INT64>>[('a', 7), ('b', 8)]
)
SELECT * FROM (
  SELECT
    i,
    SUM(a) OVER W AS a,
    SUM(b) OVER W AS b
  FROM (
    SELECT
      i,
      SumKey(data, 'a') AS a,
      SumKey(data, 'b') AS b
    FROM Input
  )
  WINDOW W AS (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
)
WHERE MOD(i, 2) = 1
ORDER BY i;

这导致:

+---+----+----+
| i | a  | b  |
+---+----+----+
| 1 |  4 |  6 |
| 3 | 12 | 14 |
+---+----+----+
于 2017-06-20T17:01:32.870 回答