1

在 BigQuery 中创建了以下查询:

SELECT
  date,
  userId,
  SUM(totals.visits) totalvisits,
  GROUP_CONCAT(device.deviceCategory) sequentialdevice
FROM (
  SELECT
    date,
    visitStartTime,
    customDimensions.value userId,
    totals.visits,
    device.deviceCategory
  FROM
    TABLE_DATE_RANGE([164345793.ga_sessions_], TIMESTAMP('20171127'), CURRENT_TIMESTAMP())
  WHERE
    customDimensions.index = 1
    AND customDimensions.value CONTAINS "hip|"
  GROUP BY
    date,
    visitStartTime,
    userId,
    totals.visits,
    device.deviceCategory
  HAVING
    userId="hip|7e4fbce9-bbfb-4677-aab0-dcd02851fdb4"
  ORDER BY
    date ASC,
    visitStartTime ASC)
GROUP BY
  date,
  userId

作为临时措施,我使用 having 子句对其进行测试(这将在生产中删除)查询输出以下内容:

在此处输入图像描述

这一切都很好并且按预期工作,以适当的顺序输出设备(平板电脑,平板电脑,平板电脑,手机,桌面) - 但是,我想从中删除重复项,所以结果将是“平板电脑,手机,桌面”

我尝试使用 Unique() 函数,这会删除重复项,但是顺序不会保留,因此输出变为“桌面、移动、平板电脑”

任何帮助,将不胜感激!

更新

我将查询更新为标准 SQL,现在使用 string_agg() 函数遇到了另一个问题:

SELECT
  date,
  userId,
  totalsvisits,
  STRING_AGG(DISTINCT devicecategory ORDER BY date ASC, vstime ASC) deviceAgg
FROM (
  SELECT
    date,
    visitStartTime vstime,
    cd.value userId,
    totals.visits totalsvisits,
    device.deviceCategory devicecategory
  FROM
    `12314124123123.ga_sessions_*`,
    UNNEST(customDimensions) AS cd
  WHERE
    cd.index=1
    AND cd.value IS NOT NULL
  GROUP BY
    date,
    visitStartTime,
    userId,
    totals.visits,
    device.deviceCategory
  HAVING
    userId="hip|7e4fbce9-bbfb-4677-aab0-dcd02851fdb4"
  ORDER BY
    date ASC,
    visitStartTime ASC)
GROUP BY
  date,
  userId,
  totalsvisits

返回的错误是“具有 DISTINCT 和 ORDER BY 参数的聚合函数只能 ORDER BY 作为函数参数的列”

显然,如果我们从 string_agg 中删除 distinct 或 order by 子句,这是可行的,但我们需要这两个操作。

4

2 回答 2

1

对于更新后的问题,以下查询会产生相同的错误:

SELECT age_midpoint, STRING_AGG(DISTINCT country ORDER BY c DESC)
FROM (
  SELECT country, age_midpoint, COUNT(*) c
  FROM `fh-bigquery.stackoverflow.survey_results_2016`
  WHERE age_midpoint IS NOT null
  AND country LIKE '%u%'
  GROUP BY 1, 2
)
GROUP BY 1
ORDER BY 1

这个限制是有道理的,因为一旦你运行DISTINCT,你就会失去对你想要提供订单的变量的可见性。

试试这个:

#standardSQL
SELECT age_midpoint, ARRAY_TO_STRING(ARRAY(
  SELECT country FROM (SELECT country, c FROM UNNEST(arr) GROUP BY country, c) ORDER BY c DESC
), ',')
FROM (
  SELECT age_midpoint, ARRAY_AGG(STRUCT(country, c)) arr
  FROM (
    SELECT country, age_midpoint, COUNT(*) c
    FROM `fh-bigquery.stackoverflow.survey_results_2016`
    WHERE age_midpoint IS NOT null
    AND country LIKE '%u%'
    GROUP BY 1, 2
  )
  GROUP BY 1
)
ORDER BY 1
LIMIT 1000

(参见https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#creating-arrays-from-subqueries

于 2018-01-18T00:24:42.917 回答
0

感谢 Felipe,这是完整的查询!

SELECT
date, value, SUM(visits) visits, STRING_AGG(DISTINCT seqdevice) seqdevice, COUNT(DISTINCT seqdevice) countseqdevice
FROM (
SELECT date, value, visits, ARRAY_TO_STRING(ARRAY(
  SELECT deviceCategory FROM (SELECT deviceCategory, c FROM UNNEST(arr) GROUP BY deviceCategory, c) ORDER BY c DESC
), ',') seqdevice
FROM (
SELECT date, visitStartTime, value, visits, ARRAY_AGG(STRUCT(deviceCategory, c)) arr
FROM (
    SELECT date, visitStartTime, cd.value value, totals.visits visits, device.deviceCategory deviceCategory, COUNT(*) c
    FROM `xxxxxxxxxx`, UNNEST(customDimensions) AS cd
    WHERE cd.index=1 AND STARTS_WITH(cd.value,"hip|")
    GROUP BY 1, 2, 3, 4, 5
    )
GROUP BY 1, 2, 3, 4
)
ORDER BY 2)
 GROUP BY 1, 2
HAVING
  value="hip|7e4fbce9-bbfb-4677-aab0-dcd02851fdb4"

ORDER BY countseqdevice desc
于 2018-01-18T17:47:55.510 回答