1

我需要从包含相似结构数据的两个表中选择匹配的对。这里的“匹配对”是指在“匹配​​”列中相互引用的两行。

单表配对示例:

TABLE
----
id | matchid
1  |   2
2  |   1

ID 1 和 2 是一对匹配的,因为每个都有一个匹配条目。

现在真正的问题是:选择出现在两个表中的匹配对的最佳(最快)方法是什么:

Table ONE (id, matchid)
Table TWO (id, matchid)

示例数据:

ONE                TWO
----               ----
id  | matchid      id  | matchid
1   |   2          2   |   3
2   |   3          3   |   2
3   |   2
4   |   5
5   |   4

所需的结果是 ID 为 2 和 3 的单行。

RESULT
----
id  | id
2   | 3

这是因为 2 和 3 是表一和表二中的匹配对。4 & 5 是表 ONE 中的匹配对,但不是 TWO,因此我们不选择它们。1 和 2 根本不是匹配对,因为 2 没有与 1 匹配的条目。

我可以从一个表中得到匹配的对:

SELECT a.id, b.id 
    FROM ONE a JOIN ONE b
       ON a.id = b.matchid AND a.matchid = b.id
    WHERE a.id < b.id

我应该如何构建一个只选择两个表中出现的匹配对的查询?

我是不是该:

  • 为每个表选择上面的查询并将它们一起存在于哪里?
  • 为每个表选择上面的查询并将它们连接在一起?
  • 选择上面的查询然后 JOIN table TWO 两次,一次用于“id”,一次用于“matchid”?
  • 为每个表选择上面的查询并循环以在 php 中比较它们?
  • 不知何故过滤表二,所以我们只需要查看表一中匹配对的ID吗?
  • 做一些完全不同的事情?

(由于这是一个效率问题,值得注意的是匹配会非常稀疏,可能是 1/1000 或更少,并且每个表将有 100,000+ 行。)

4

3 回答 3

1

我想我明白你的意思。您想要过滤两个表中存在对的记录。

SELECT  LEAST(a.ID, a.MatchID) ID, GREATEST(a.ID, a.MatchID) MatchID
FROM    One a
        INNER JOIN Two b
            ON a.ID = b.ID AND
                a.matchID = b.matchID
GROUP   BY LEAST(a.ID, a.MatchID), GREATEST(a.ID, a.MatchID)
HAVING  COUNT(*) > 1
于 2013-04-17T05:13:15.310 回答
0

试试这个查询:

   select 
    O.id,
    O.matchid
    from 
    ONE O
    where 
    (CAST(O.id as CHAR(50))+'~'+CAST(O.matchid as CHAR(50)))
    in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)

编辑查询:

select distinct
Least(O.id,O.matchid) ID,
Greatest(O.id,O.matchid) MatchID
from 
ONE O
where 
(CAST(O.id as CHAR(50))+'~'+CAST(O.matchid as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
and (CAST(O.matchid as CHAR(50))+'~'+CAST(O.id as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)

SQL小提琴

于 2013-04-17T05:26:49.247 回答
0

Naive version, which checks all the four rows that need to exist:

-- EXPLAIN ANALYZE
WITH both_one AS (
        SELECT o.id, o.matchid
        FROM one o
        WHERE o.id < o.matchid
        AND EXISTS ( SELECT * FROM one x WHERE x.id = o.matchid AND x.matchid = o.id)
        )
, both_two AS (
        SELECT t.id, t.matchid
        FROM two t
        WHERE t.id < t.matchid
        AND EXISTS ( SELECT * FROM two x WHERE x.id = t.matchid AND x.matchid = t.id)
        )
SELECT *
FROM both_one oo
WHERE EXISTS (
        SELECT *
        FROM both_two tt
        WHERE tt.id = oo.id AND tt.matchid = oo.matchid
        );

This one is simpler :

-- EXPLAIN ANALYZE
WITH pair AS (
        SELECT o.id, o.matchid
        FROM one o
        WHERE EXISTS ( SELECT * FROM two x WHERE x.id = o.id AND x.matchid = o.matchid)
        )
SELECT *
FROM pair pp
WHERE EXISTS (
        SELECT *
        FROM pair xx
        WHERE xx.id = pp.matchid AND xx.matchid = pp.id
        )
AND pp.id < pp.matchid
        ;
于 2013-04-17T16:28:17.867 回答