sql - 为什么 PostgreSQL 在一个表上分组并选择另一个表时不接受

Question

我在 x86_64-unknown-linux-gnu 上使用 postgreSQL 版本 PostgreSQL 9.1.9，由 gcc (Ubuntu/Linaro 4.7.2-22ubuntu5) 4.7.2 编译，64 位，我的问题是加入两个表，让它命名作为 temp1 和 temp2 ，这里我需要加入这两个表

表结构是

marks_map
marks   int
stud_id  int

student
stud_id int
class_id int

这是我的查询

select class_id,stud_id,count(marks) 
from student as s 
inner join marks_map as m on (s.stud_id=m.stud_id) group by stud_id

在这里我得到错误

ERROR:  column "s.class_id" must appear in the GROUP BY clause or be used in an aggregate function

为什么会发生此错误？如果我使用class_id它group by运行成功。

score 1 · Accepted Answer

You have to add the class_id attribute to your group by clause because in your select part of the statement there is no aggregation function over this attribue.

In GROUP BY statments you have to add all the attributes over which you haven't aggregated after the GROUP BY clause.

For example:

SELECT
non-aggregating-attr-1, non-aggregating-attr2, non-aggregating-attr3, sum(attr4)
FROM
table
GROUP BY
non-aggregating-attr-1, non-aggregating-attr2, non-aggregating-attr3

score 0 · Accepted Answer

您应该能够理解甚至不涉及 JOIN 的简化案例的问题。

该查询SELECT x,[other columns] GROUP BY x表达了这样一个事实，即对于的每个不同值x，[其他列] 必须输出，每个仅一行x。

现在看一个简化的示例，其中student表有两个条目：

stud_id=1，class_id=1
stud_id=1，class_id=2

我们要求SELECT stud_id,class_id FROM student GROUP BY class_id.

的值只有一个stud_id，即 1。

所以我们告诉 SQL 引擎，给我一行，它stud_id=1的值class_id随之而来。问题是没有一个，而是两个这样的值，1 和 2。那么选择哪一个呢？SQL 引擎不是随机选择，而是产生一个错误，说这个问题首先在概念上是虚假的，因为没有规则说每个不同的值stud_id都有自己对应的class_id.

另一方面，如果非 GROUP 的输出列是将一系列值转换为一个的聚合函数，例如min、max或count，那么它们提供了缺少的规则，说明如何从多个值中只获取一个值。这就是 SQL 引擎可以使用的原因，例如：SELECT stud_id,count(class_id) FROM student GROUP BY stud_id;.

此外，当遇到错误列“somecolumn”必须出现在 GROUP BY 子句中时，您不想GROUP BY在错误消失之前只添加列，就好像它纯粹是一个语法问题一样。这是一个语义问题，添加到 GROUP BY 的每一列都会改变提交给 SQL 引擎的问题的意义。

也就是说，GROUP BY x,y表示(x,y) 对的每个不同值。这并不意味着GROUP BY x，嘿，因为它会导致错误，所以我们y也扔进去！

score 0 · Accepted Answer

这就是group by工作的方式。

您可以检查您的数据，例如

select
    array_agg(class_id) as arr_class_id,
    stud_id, count(marks) 
from student as s 
   inner join marks_map as m on (s.stud_id=m.stud_id)
group by stud_id

并查看每个组有多少 class_id。有时您的 class_id 依赖于 stud_id （每个组的数组中只有一个 elemnet），因此您可以使用虚拟聚合，例如：

select
    max(class_id) as class_id,
    stud_id, count(marks) 
from student as s 
   inner join marks_map as m on (s.stud_id=m.stud_id)
group by stud_id

sql - 为什么 PostgreSQL 在一个表上分组并选择另一个表时不接受

3 回答 3

Related

Reference