我正在尝试查询一个使用基本重复字段来存储数据的表,如下所示:
+---+----------+------------+
| i | data.key | data.value |
+---+----------+------------+
| 0 | a | 1 |
| | b | 2 |
| 1 | a | 3 |
| | b | 4 |
| 2 | a | 5 |
| | b | 6 |
| 3 | a | 7 |
| | b | 8 |
+---+----------+------------+
我试图弄清楚如何运行一个得到类似结果的查询
+---+----+----+
| i | a | b |
+---+----+----+
| 1 | 4 | 6 |
| 3 | 12 | 14 |
+---+----+----+
其中每一行代表一个不重叠的总和(即i=1
是行i=0
和的总和i=1
),并且数据已经过旋转,data.key
现在是一列。
问题1:
我尽我所能将此答案转换为使用标准 SQL 并最终得到:
SELECT
i,
(SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`,
(SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b`
FROM
`dataset.testing.dummy`)
这行得通,但我想知道是否有更好的方法来做到这一点,特别是因为它在尝试使用分析函数时会产生特别冗长的查询:
SELECT
i,
SUM(a) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `a`,
SUM(b) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS `b`
FROM (
SELECT
i,
(SELECT SUM(value) FROM UNNEST(data) WHERE key = 'a') as `a`,
(SELECT SUM(value) FROM UNNEST(data) WHERE key = 'b') as `b`
FROM
`dataset.testing.dummy`)
ORDER BY
i;
问题2:
如何编写ROW
orRANGE
语句以使生成的窗口不重叠。在最后一个查询中,我得到了数据的滚动总和,这并不是我想要做的。
+---+----+----+
| i | a | b |
+---+----+----+
| 0 | 1 | 2 |
| 1 | 4 | 6 |
| 2 | 8 | 10 |
| 3 | 12 | 14 |
+---+----+----+
滚动总和为每一行产生一个结果,而我试图减少返回的行数。