0

在我的 BQ standardsql 查询中,当我使用很少的分析函数(https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#analytic-functions)时,我收到此错误:

Resources exceeded during query execution. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

该查询的计算或多或少类似于以下字段:

case when 1 = ROW_NUMBER() over (partition by Y,m,operatingSystem)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,m,operatingSystem)
else null end as NewUniqueVisitorsMonthlyOS

当我拆分查询并逐个运行每个部分时,它们都运行良好。但是,我不想将查询拆分为多个,因为我需要将所有字段都放在一个最终的 BQ 视图中。

有什么办法可以解决这个错误吗?

UPD:这是一个查询示例。当我添加更多字段时,它会停止处理上述错误。

SELECT 
distinct
Date
,channelGrouping
,country
,browser
,deviceCategory
,operatingSystem

#Visits by all dimensions
,count(distinct concat(fullvisitorid,cast(visitid as string))) 
over (partition by concat(Y,m,d),channelGrouping,country,browser,deviceCategory,operatingSystem)
as Visits 

#Daily Users Browser
,case when 1 = ROW_NUMBER() over (partition by Y,m,d,browser)
then count (distinct fullvisitorid) 
over (partition by Y,m,d,browser)
else null end as UniqueVisitorsDailyBrowser


#Weekly New Users
,case when 1 = ROW_NUMBER() over (partition by Y,U,channelGrouping)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,U,channelGrouping)
else null end as NewUniqueVisitorsWeeklyChannel

#Monthly New Users
,case when 1 = ROW_NUMBER() over (partition by Y,m,operatingSystem)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,m,operatingSystem)
else null end as NewUniqueVisitorsMonthlyOS

FROM GA_Export_Schema
4

1 回答 1

0

当我需要用一个查询来回答几个不同的问题时,我通常会尝试使用一些UNION ALL带有一些的操作来区分数据。

我根据您的查询在我们的数据集中测试了这个查询:

SELECT
  date date,
  country country,
  channel channel,
  browser browser,
  cat cat,
  os os,
  MAX(all_keys_visits) all_keys_visits,
  MAX(browser_visits) browser_visits,
  MAX(week_new_channel_users) week_new_channel_users,
  MAX(month_new_os_users) month_new_os_users from(
  SELECT
    date,
    country,
    channel,
    browser,
    cat,
    os,
    visits AS all_keys_visits,
    MAX(CASE
        WHEN tmp = 'browser' THEN visits END) OVER(PARTITION BY browser) browser_visits,  MAX(CASE
        WHEN tmp = 'weekly_new_users' THEN visits END) OVER(PARTITION BY channel, week_date) week_new_channel_users,
    MAX(CASE
        WHEN tmp = 'monthly_new_users' THEN visits END) OVER(PARTITION BY os, month_date) month_new_os_users from(
          SELECT
            tmp,
            date,
            week_date,
            month_date,
            country,
            channel,
            browser,
            cat,
            os,
            visits  FROM (
              SELECT
                'all_visitors' AS tmp,
                date,
                FORMAT_DATE("%W", parse_DATE('%Y%m%d',  date)) week_date,
                FORMAT_DATE("%m", parse_DATE('%Y%m%d',  date)) month_date,
                geonetwork.country country,
                channelGrouping channel,
                device.browser browser,
                device.devicecategory cat,
                device.operatingSystem os,
                COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitid AS string))) visits
              FROM `project_id.dataset_id.ga_sessions*`
              WHERE 1 = 1
              AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))  GROUP BY tmp,  date,  channel,  country,  browser,  cat,  os )
            UNION ALL (
              SELECT
               'browser' AS tmp,
                date,
                FORMAT_DATE("%W", parse_DATE('%Y%m%d',  date)) week_date,
                FORMAT_DATE("%m", parse_DATE('%Y%m%d',  date)) month_date,
                '' country,
                '' channel,
                device.browser browser,
                '' cat,
                '' os,
                COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitid AS string))) visits
                FROM `project_id.dataset_id.ga_sessions*`  WHERE 1 = 1 AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
                GROUP BY tmp,  date,  channel,  country,  browser,  cat,  os )
            UNION ALL (
              SELECT
               'weekly_new_users' AS tmp,
                date,
                FORMAT_DATE("%W", parse_DATE('%Y%m%d',  date)) week_date,
                FORMAT_DATE("%m", parse_DATE('%Y%m%d',  date)) month_date,
                '' country,
                channelGrouping channel,
                '' browser,
                '' cat,
                '' os,
                COUNT(DISTINCT CASE
                   WHEN totals.newVisits = 1 THEN CONCAT(fullvisitorid, CAST(visitid AS string)) END) visits
              FROM `project_id.dataset_id.ga_sessions*`
              WHERE
              1 = 1
              AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
              AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
            GROUP BY
              tmp,
              date,
              channel,
              country,
              browser,
              cat,
              os )
          UNION ALL (
            SELECT
              'monthly_new_users' AS tmp,
              date,
              FORMAT_DATE("%W", parse_DATE('%Y%m%d',
                  date)) week_date,
              FORMAT_DATE("%m", parse_DATE('%Y%m%d',
                  date)) month_date,
              '' country,
              '' channel,
              '' browser,
              '' cat,
              device.operatingSystem os,
              COUNT(DISTINCT
                CASE
                  WHEN totals.newVisits = 1 THEN CONCAT(fullvisitorid, CAST(visitid AS string)) END) visits
            FROM
              `project_id.dataset_id.ga_sessions*`
            WHERE
              1 = 1
              AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
              AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
            GROUP BY
              tmp,
              date,
              channel,
              country,
              browser,
              cat,
              os ) ) )
GROUP BY
  date,
  country,
  channel,
  browser,
  cat,
  os
HAVING
  (country != ''
    AND channel != ''
    AND browser != ''
    AND cat != ''
    AND os != '')

基本上在每个 UNION 上,我创建一个键,然后根据该键和您要分析的值进行聚合。之后,我只是删除了创建为空字符串的字段。

我尝试处理 30 天的数据,这些数据消耗了几个 gigas,但它仍然在不到 20 秒的时间内完成,所以它也可能对你有用(请注意,在这里通过单独运行联合然后聚合,你最终会在一个小得多的环境中工作避免资源耗尽的数据量)。

于 2017-05-08T23:25:45.467 回答