sql - SQL删除具有相同值集的行

Question

我有一张这样的桌子：

ID, ItemsID
1   2                                                                       
1   3     
1   4   
2   3  
2   4  
2   2

我想删除像ID=2 因为与我的情况2,3,4相同的元组3,4,2 。

如何使用 SQL 做到这一点？

score 1 · Accepted Answer

抱歉，我没有及时看到 Oracle 标签。但是，我将保留 MySQL 解决方案以供参考。显然GROUP_CONCAT()在某些 Oracle 版本中有类似的东西。

它可能不是最优雅的解决方案，但这可以完成工作：

DELETE FROM t WHERE ID IN (
  SELECT ID
  FROM (SELECT ID, GROUP_CONCAT(ItemsID ORDER BY ItemsID) AS tuple FROM t GROUP BY ID) AS tuples
  WHERE EXISTS (
    SELECT TRUE
    FROM (SELECT ID, GROUP_CONCAT(ItemsID ORDER BY ItemsID) AS tuple FROM t GROUP BY ID) tuples2
    WHERE tuples2.tuple = tuples.tuple
    AND tuples2.ID < tuples.ID
  )
)

SQLfiddle

您可能需要调整group_concat_max_len。

score 1 · Accepted Answer

另一种方法。Oracle 10gR1 或更高版本。在此示例中ItemsID，s 1、2、6 具有相同的一组值，ID3 具有另一个值，4 和 5 具有另一个值。因此，我们将删除IDs 2、6 和 5，因为它们似乎是重复的，通过ItemsID将特定ID组的每组元素表示为嵌套表并使用multiset except运算符确定组中的元素是否相同：

-- set-up
SQL> create table tb_table(
  2    id number,
  3    itemsid number);

Table created

SQL> insert into tb_table(id, itemsid)
  2    select 1, 2 from dual union all
  3    select 1, 3 from dual union all
  4    select 1, 4 from dual union all
  5    select 2, 4 from dual union all
  6    select 2, 3 from dual union all
  7    select 2, 2 from dual union all
  8    select 3, 2 from dual union all
  9    select 3, 3 from dual union all
 10    select 3, 6 from dual union all
 11    select 3, 4 from dual union all
 12    select 4, 1 from dual union all
 13    select 4, 2 from dual union all
 14    select 4, 3 from dual union all
 15    select 5, 1 from dual union all
 16    select 5, 2 from dual union all
 17    select 5, 3 from dual union all
 18    select 6, 2 from dual union all
 19    select 6, 4 from dual union all
 20    select 6, 3 from dual;

19 rows inserted

SQL> commit;

Commit complete

SQL> create or replace type t_numbers as table of number;
  2  /

Type created

-- contents of the table
SQL> select *
  2    from tb_table;

        ID    ITEMSID
---------- ----------
         1          2
         1          3
         1          4
         2          4
         2          3
         2          2
         3          2
         3          3
         3          6
         3          4
         4          1
         4          2
         4          3
         5          1
         5          2
         5          3
         6          2
         6          4
         6          3


19 rows selected

SQL> delete from tb_table
  2   where id in (with DataGroups as(
  3                  select id
  4                       , grp
  5                       , (select count(*) from table(grp)) cnt
  6                    from (select id
  7                               , cast(collect(itemsid) as t_numbers) grp
  8                           from tb_table
  9                          group by id
 10                          )
 11                   )
 12                   select distinct id2
 13                     from ( select dg1.id as id1
 14                                 , dg2.id as id2
 15                                 , (dg1.grp multiset except dg2.grp) res
 16                                 , dg1.cnt
 17                             from DataGroups Dg1
 18                            cross join DataGroups Dg2
 19                            where dg1.cnt = dg2.cnt
 20                            order by dg1.id
 21                           ) t
 22                    where res is empty
 23                      and id2 > id1
 24                   )
 25  ;

9 rows deleted

 SQL> select *
  2    from tb_table;

        ID    ITEMSID
---------- ----------
         1          2
         1          3
         1          4
         3          2
         3          3
         3          6
         3          4
         4          1
         4          2
         4          3

10 rows selected

score 0 · Accepted Answer

这是我能想到的最好的方法，但不知何故，我觉得必须有一个更简单的解决方案：

delete from items
where id in (
  select id 
  from (
    with counts as (
       select id, 
              count(*) as cnt
       from items
       group by id
    )
    select c1.id, row_number() over (order by c1.id) as rn
    from counts c1
      join counts c2 
        on c1.id <> c2.id and c1.cnt = c2.cnt
    and not exists (select i1.itemsid
                    from items i1
                    where i1.id = c1.id
                    minus 
                    select i2.itemsid
                    from items i2
                    where i2.id = c2.id)
  ) t
  where rn <> 1
);

它适用于任意数量的itemsid值。

结合窗口定义中的rn <> 1升序排序将保留表中的最小 id（在您的情况下为1）。如果要保留最高的 ID 值，则需要将排序顺序更改为over (order by c1.id desc)

sql - SQL删除具有相同值集的行

3 回答 3

Related

Reference