17

我正在对数据集执行 GROUP BY 和 COUNT(*),我想计算每个组占总数的百分比。

例如,在这个查询中,我想知道每个州的 count() 占总数的多少(从 publicdata:samples.natality 中选择 count( )):

SELECT state, count(*)
FROM [publicdata:samples.natality]
GROUP by state

在 SQL 中有几种方法可以做到这一点,但我还没有找到在 Bigquery 中做到这一点的方法,有人知道吗?

谢谢!

4

4 回答 4

16

检查 ratio_to_report,最近宣布的窗口函数之一:

SELECT state, ratio * 100 AS percent FROM (
 SELECT state, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
 FROM [publicdata:samples.natality]
 GROUP by state
)

state   percent
AL      1.4201828131159113   
AK      0.23521048665998198  
AZ      1.3332896746620975   
AR      0.7709591206172346   
CA      10.008298605982642
于 2013-06-11T15:57:16.390 回答
13

修改 Felipe 对标准 SQL BigQuery 方言而不是旧版 SQL 方言的答案如下所示:

select state, 100*(state_count / total) as pct
from (
  SELECT state, count(*) AS state_count, sum(count(*)) OVER() AS total
  FROM `bigquery-public-data.samples.natality` 
  GROUP by state
) s

标准 SQL BigQuery 聚合分析函数(又名“窗口函数”)的文档在这里: https ://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts

于 2018-06-07T09:02:20.747 回答
4

您可以使用虚拟值作为键对总数进行自联接。例如:

SELECT
  t1.state AS state,
  t1.cnt AS cnt,
  100 * t1.cnt / t2.total as percent
FROM (
  SELECT
    state,
    COUNT(*) AS cnt,
    1 AS key
  FROM
    [publicdata:samples.natality]
  WHERE state is not null
  GROUP BY
    state) AS t1
JOIN (
  SELECT
    COUNT(*) AS total,
    1 AS key
  FROM
    [publicdata:samples.natality]) AS t2
ON t1.key = t2.key
ORDER BY percent DESC
于 2013-06-05T20:15:47.380 回答
3

您可以使用窗口函数按组获取总数的百分比,而无需子查询(改进 evan_b 的解决方案):

SELECT 
   state
   ,count(*) / (sum(count(*)) OVER()) as pct
FROM  
   `bigquery-public-data.samples.natality` 
GROUP BY 
   state
于 2021-03-01T12:03:09.290 回答