sql - 保持一个重复实例出现在两列之一中

Question

我有一个表，其中包含一个具有唯一 ID 的列和一个具有每个唯一 ID 的配偶 ID 的列（如果他们有配偶）。问题是每个配偶 ID 也出现在唯一 ID 列中，所以当我拉出一个列表，试图将一对夫妇视为一个单位时，我经常重复计算一对夫妇。

获取给定的唯一 ID 列表，检查他们的配偶是否也在同一个唯一 ID 列表中，并且每对夫妇只返回一个唯一 ID，有什么好的、有效的方法？

这个问题有点复杂，因为有时夫妻双方都不在同一个名单中，所以如果他们结婚了，这不仅仅是留下一个人的问题。如果配偶不在同一个列表中，我想确保保留那个。我还想确保我保留了配偶 ID 列中所有具有 NULL 值的人。

相关表格的子集：

Unique_ID      Spouse_ID
    1              2
    2              1
    3             NULL
    4             NULL
    5              10
    6              25
    7             NULL
    8              9
    9              8
   10              5

在这段摘录中，ID 的 3、4 和 7 都是单身。ID 的 1、2、5、8 和 9 的配偶出现在 Unique_ID 列中。ID 6 的配偶的 ID 未出现在 Unique_ID 列中。所以，我想保留 ID 的 1（或 2）、3、4、5（或 10）、6、7 和 8（或 9）。希望这是有道理的。

score 1 · Accepted Answer

我的倾向是合并这两个列表并删除重复项：

select distinct id
from ((select id
       from t
      ) union all
      (select spouse_id
       from t
       where spouse_id in (select id from t)
      )
     ) t

但是，您的问题要求一种有效的方法。考虑这一点的另一种方法是添加一个新列，如果在 id 列表中，则为配偶 id，否则为 NULL（这使用 a left outer join。然后有三种情况：

没有配偶 id，所以使用 id
id小于原来的id。用它。
配偶 id 小于原始 id。丢弃此记录，因为正在使用原始记录。

这是表达这一点的一种明确方式：

select IdToUse
from (select t.*, tspouse.id tsid,
             (case when tspouse.id is null then t.id
                   when t.id < tspouse.id then t.id
                   else NULL
              end) as IdToUse
      from t left outer join
           t tspouse
           on t.spouse_id = tspouse.id
     ) t
where IdToUse is not null;

您可以将其简化为：

  select t.*, tspouse.id tsid,
         (case when tspouse.id is null then t.id
               when t.id < tspouse.id then t.id
               else NULL
          end) as IdToUse
  from t left outer join
       t tspouse
       on t.spouse_id = tspouse.id
  where tspouse.id is null or
        t.id < tspouse.id

score 0 · Accepted Answer

两张桌子只是简单的糟糕设计

select id 
from table 
where id < spouseID
   or spouseID is null

sql - 保持一个重复实例出现在两列之一中

2 回答 2

Related

Reference