5

我正在尝试运行一个查询,列出用户为我的数据集提交的所有不同市场。市场列中的值已经是数组格式。当我运行下面的查询时,我得到一个数组数组,并且某些市场可能会被多次列出,因为 distinct 子句查看的是唯一数组而不是数组中的值。例如,如果我尝试将 ['New York'] 和 ['New York' , 'Chicago'] 分组,我的目标是获得 ['New York', 'Chicago'] 作为我的结果,但目前正在[['纽约'],['纽约','芝加哥']]。感谢任何帮助。

SELECT 
  s.submitter_id,
  ARRAY_AGG(DISTINCT s.markets)
FROM 
  analytics.submissions AS s
GROUP BY 1
4

2 回答 2

2

一个简单的方法是先将数组展平

WITH data AS (
 SELECT submitter_id, split(markets,';') AS markets 
 FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT 
  a.submitter_id,
  ARRAY_AGG(DISTINCT a.market)
FROM (
    SELECT s.submitter_id
        ,f.value AS market
    FROM data AS s,
    LATERAL FLATTEN(input => s.markets) f
) AS a
GROUP BY 1;
于 2020-03-17T20:30:27.203 回答
1

使用 javascript UDF 的变体:

WITH data AS (
 SELECT submitter_id, split(markets,';') AS markets 
 FROM VALUES (1,'new york'), (1,'new york;chicargo') s(submitter_id, markets)
)
SELECT 
  submitter_id,
  array_flat_distinct(ARRAY_AGG(distinct markets))
from data
group by 1;

其中UDF定义为:

create or replace function array_flat_distinct("a" array)
returns array
language javascript
as
$$
    return [...new Set(a.reduce((b,c)=>[...b,...c]))]
$$
;
于 2020-03-17T22:25:46.077 回答