3

我有两个不同的 Google 电子表格:

一个有 4 列

+------+------+------+------+
| Col1 | Col2 | Col5 | Col6 |
+------+------+------+------+
| ID1  | A    | B    | C    |
| ID2  | D    | E    | F    |
+------+------+------+------+

一个包含上一个文件的 4 列,还有 2 列

+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+------+------+
| ID3  | G    | H    | J    | K    | L    |
| ID4  | M    | N    | O    | P    | Q    |
+------+------+------+------+------+------+

我在 Google BigQuery 中将它们配置为联合源,现在我需要创建一个视图来连接两个表的数据。

两个表都有Col1列,其中包含一个 ID,此 ID 在所有表中是唯一的,不包含复制数据。

我正在寻找的结果表如下:

+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+------+------+
| ID1  | A    | NULL | NULL | B    | C    |
| ID2  | D    | NULL | NULL | E    | F    |
| ID3  | G    | H    | J    | K    | L    |
| ID4  | M    | N    | O    | P    | Q    |
+------+------+------+------+------+------+

对于第一个文件没有的列,我期待一个NULL值。

我使用的是标准 SQL,这是一个可用于生成示例数据的语句:

#standardsQL

WITH table1 AS (
  SELECT "A" as Col1, "B" as Col2, "C" AS Col3
  UNION ALL
  SELECT "D" as Col1, "E" as Col2, "F" AS Col3
),

table2 AS (
  SELECT "G" as Col1, "H" as Col2, "J" AS Col3, "K" AS Col4, "L" AS Col5
  UNION ALL
  SELECT "M" as Col1, "N" as Col2, "O" AS Col3, "P" AS Col4, "Q" AS Col5
)

一个简单UNION ALL的不起作用,因为表有不同的列

SELECT * FROM table1
UNION ALL
SELECT * FROM table2

Error: Queries in UNION ALL have mismatched column count; query 1 has 3 columns, query 2 has 5 columns at [17:1]

通配符运算符不是一种合适的方式,因为联合来源不支持

SELECT * FROM `table*`

Error: External tables cannot be queried through prefix

当然这是一个样本数据,只有 3-5 列,真实的表有 20-40 列。因此,我需要SELECT逐个字段显式显示的示例不是一个可观的方法。

有没有一种工作方式可以加入这两个表?

4

2 回答 2

5

您可以通过 UDF 传递行来处理列名未按位置对齐或表之间存在不同数量的情况。这是一个例子:

CREATE TEMP FUNCTION CoerceRow(json_row STRING)
RETURNS STRUCT<Col1 STRING, Col2 STRING, Col3 STRING, Col4 STRING, Col5 STRING>
LANGUAGE js AS """
return JSON.parse(json_row);
""";

WITH table1 AS (
  SELECT "A" as Col5, "B" as Col3, "C" AS Col2
  UNION ALL
  SELECT "D" as Col5, "E" as Col3, "F" AS Col2
),

table2 AS (
  SELECT "G" as Col1, "H" as Col2, "J" AS Col3, "K" AS Col4, "L" AS Col5
  UNION ALL
  SELECT "M" as Col1, "N" as Col2, "O" AS Col3, "P" AS Col4, "Q" AS Col5
)
SELECT CoerceRow(json_row).*
FROM (
  SELECT TO_JSON_STRING(t1) AS json_row
  FROM table1 AS t1
  UNION ALL
  SELECT TO_JSON_STRING(t2) AS json_row
  FROM table2 AS t2
);
+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 |
+------+------+------+------+------+
| NULL | C    | B    | NULL | A    |
| NULL | F    | E    | NULL | D    |
| G    | H    | J    | K    | L    |
| M    | N    | O    | P    | Q    |
+------+------+------+------+------+

请注意,该CoerceRow函数需要在输出中声明您想要的显式行类型。除此之外,被联合的表中的列只是按名称匹配。

于 2018-01-10T17:18:07.620 回答
4

有没有一种工作方式可以加入这两个表?

#standardsQL
SELECT *, NULL AS Col5, NULL AS Col6 FROM table1
UNION ALL
SELECT * FROM table2  

你可以用你的例子检查一下

#standardsQL
WITH table1 AS (
  SELECT "ID1" AS Col1, "A" AS Col2, "B" AS Col3, "C" AS Col4 
  UNION ALL
  SELECT "ID2", "D", "E", "F"
),
table2 AS (
  SELECT "ID3" Col1, "G" AS Col2, "H" AS Col3, "J" AS Col4, "K" AS Col5, "L" AS Col6 
  UNION ALL
  SELECT "ID4", "M", "N", "O", "P", "Q" 
)
SELECT *, NULL AS Col5, NULL AS Col6 FROM table1
UNION ALL
SELECT * FROM table2
于 2018-01-10T16:57:59.807 回答