标准偏差分析可能是查找异常值的有用方法。有没有办法合并这个查询的结果(找到远离平均值的第四个标准偏差的值)......
SELECT (AVG(weight_pounds) + STDDEV(weight_pounds) * 4) as high FROM [publicdata:samples.natality];
结果 = 12.721342001626912
...进入另一个查询,该查询生成有关哪些州和日期的大多数婴儿出生时体重与平均值相差 4 个标准差的信息?
SELECT state, year, month ,COUNT(*) AS outlier_count
FROM [publicdata:samples.natality]
WHERE
(weight_pounds > 12.721342001626912)
AND
(state != '' AND state IS NOT NULL)
GROUP BY state, year, month
ORDER BY outlier_count DESC;
结果:
Row state year month outlier_count
1 MD 1990 12 22
2 NY 1989 10 17
3 CA 1991 9 14
本质上,将它组合成一个查询会很棒。