2

在 BigQuery 中,我们尝试运行:

SELECT day, AVG(value)/(1024*1024) FROM ( 
    SELECT value, UTC_USEC_TO_DAY(timestamp) as day, 
         PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank 
    FROM [Datastore.PerformanceDatum]
    WHERE type = "MemoryPerf"
) WHERE rank >= 0.9 AND rank <= 0.91 
GROUP BY day 
ORDER BY day desc;

它返回相对少量的数据。但我们收到的信息是:

Error: Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby

是什么导致此查询失败,子查询的大小?我们可以做一些等效的查询来避免这个问题吗?


针对评论进行编辑:如果我添加 GROUP EACH BY(并删除外部 ORDER BY),则查询失败,声称 GROUP EACH BY 在这里不可并行化。

4

1 回答 1

1

我写了一个对我有用的等效查询:

SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, UTC_USEC_TO_DAY(dtimestamp) as day, 
         PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank 
    FROM [io_sensor_data.moscone_io13]
    WHERE sensortype = "humidity"
) WHERE rank >= 0.9 AND rank <= 0.91 
GROUP BY day 
ORDER BY day desc;

如果我只运行内部查询,我会得到 3,660,624 个结果。你的数据集比这大吗?

按天分组时,外部选择只给我 4 个结果。我将尝试不同的分组,看看我是否可以在那里达到限制:

SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, dtimestamp / 1000 as day, 
         PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank 
    FROM [io_sensor_data.moscone_io13]
    WHERE sensortype = "humidity"
) WHERE rank >= 0.9 AND rank <= 0.91 
GROUP BY day 
ORDER BY day desc;

也运行,现在有 57,862 个不同的组。

我尝试了不同的组合来得到相同的错误。当您将初始数据量加倍时,我能够得到相同的错误。将数据量翻倍的简单“黑客”正在改变:

    FROM [io_sensor_data.moscone_io13]

至:

    FROM [io_sensor_data.moscone_io13], [io_sensor_data.moscone_io13]

然后我得到同样的错误。你有多少数据?你能应用一个额外的过滤器吗?由于您已经按天对 percentile_rank 进行分区,您是否可以添加一个额外的查询来仅分析一小部分天(例如,仅上个月)?

于 2013-06-28T03:41:07.680 回答