sql - IN 子查询的 WHERE 条件影响主查询 - 这是功能还是错误？

Question

假设有两个表：

Table A: A1, A2, A_Other
Table B: B1, B2, B_Other

在以下示例中， is something 是针对固定值（例如 = 'ABC' 或 < 45）检查的条件。

我写了如下查询（1）：

Select * from A
Where A1 IN (
    Select Distinct B1 from B
    Where B2 is something
    And A2 is something
);

我真正想写的是（2）：

Select * from A
Where A1 IN (
    Select Distinct B1 from B
    Where B2 is something
)
And A2 is something;

奇怪的是，两个查询都返回了相同的结果。查看查询1的解释计划时，它看起来像执行子查询时，因为条件不适用于子查询，所以它被推迟用作主查询结果的过滤器。A2 is something

我通常希望查询1失败，因为子查询本身会失败：

Select Distinct B1 from B
Where B2 is something
And A2 is something; --- ERROR:  column "A2" does not exist

但我发现情况并非如此，Postgres 将不适用的子查询条件推迟到主查询。

这是标准行为还是 Postgres 异常？这是在哪里记录的，这个功能叫什么？

另外，我发现如果我A2在 table 中添加一列B，则只有查询2可以按预期工作。在这种情况下A2，查询2中的引用仍将引用A.A2，但查询1中的引用将引用新列B.A2，因为它现在可以直接应用于子查询。

score 5 · Accepted Answer

这是一个很好的问题，很多人都遇到过，但都懒得停下来看看。

您正在做的是在WHERE子句中编写子查询；不是FROM子句中的内联视图。有区别。

在SELECTorWHERE子句中编写子查询时，您可以访问FROM主查询子句中的表。这不仅发生在 Postgres 中，而且是一种标准行为，可以在所有领先的 RDBMS 中观察到，包括 Oracle、SQL Server 和 MySQL。

当您运行第一个查询时，优化器会查看您的整个查询并确定何时检查哪些条件。正是优化器的这种行为，您看到条件被推迟到主查询，因为优化器发现在主查询本身中评估这个条件会更快，而不会影响最终结果。

如果您只运行子查询，注释掉主查询，它必然会在您提到的位置返回错误，因为没有找到所引用的列。

在上一段中，您提到您A2向 table添加了一列tableB。你观察到的是对的。这是因为隐含的参考现象。如果您没有提及列的表别名，则数据库引擎会首先在子查询的表中查找该列FROM。仅当在那里找不到该列时，才会引用主查询中的表。如果您使用以下查询，它仍然会返回相同的结果：

Select * from A aa -- Check the alias
Where A1 IN (
    Select Distinct B1 from B bb
    Where B2 is something
    And aa.A2 is something -- Check the reference
);

也许您可以在 Korth 关于关系数据库的书中找到更多信息，但我不确定。我刚刚根据我的观察回答了你的问题。我知道会发生这种情况以及为什么。我只是不知道如何为您提供进一步的参考。

score 2 · Accepted Answer

您已经解释了为什么WHERE子句中的相关子查询可以引用列表中表中的所有列FROM。

除此之外，使用JOIN或EXISTS半连接通常比相关子查询快得多。我会重写这个100% 等效的查询：

SELECT a.*
FROM   a
JOIN   (
   SELECT DISTINCT b1
   FROM   b
   WHERE  b2 is something
   ) b ON b.b1 = a.a1 
WHERE  a.a2 is something

或者，更好的是：

SELECT *
FROM   a
WHERE  EXISTS (
   SELECT 1 
   FROM   b
   WHERE  b.b1 = a.a1 
   AND    b.b2 is something
   )
AND    a.a2 is something;

score 2 · Accepted Answer

相关子查询：-如果子查询的结果取决于其父查询表的列的值，则子查询称为相关子查询。这是标准行为，而不是错误。

相关查询所依赖的列不必包含在父查询的选定列列表中。

Select * from A
Where A1 IN (
    Select Distinct B1 from B
    Where B2 is something
    And A2 is something
);

A2 是表 A 的列，父查询在表 A 上。这意味着 A2 可以在子查询中引用。上面的查询可能比下面的查询慢。

Select * from A
Where A2 is something And A1 IN (
    Select Distinct B1 from B
    Where B2 is something
);

那是因为循环中引用了来自父查询的 A2。这取决于要获取数据的条件。如果子查询类似于

Select Distinct B1 from B
Where B2 is A2

我们必须引用父查询列。或者，我们可以使用连接。

score 0 · Accepted Answer

The results are not strange, the subquery CAN referernce the PARENT query. This is called a Correlated SubQuery and is very common. In your example you used the IN operator, but usually to OPTIMIZE a query with the IN operation is to replace IN with the EXISTS operator using a Correlated SubQuery.

To elaborate on Erwin's comment about EXISTS being faster, this is because when you use IN "sometimes" requires the Query to find all the values of the set. Whereas using EXISTS simply requires the First occurrence to be found to satisfy the condition. However it maybe the case that the Query Plan optimizes both to be the same. But using EXISTS explicitly assist the Optimizer in constructing the intended Query Plan faster.

sql - IN 子查询的 WHERE 条件影响主查询 - 这是功能还是错误？

4 回答 4

Related

Reference