23

我在 Oracle 数据库表中有行,对于两个字段的组合应该是唯一的,但表上没有设置唯一约束,因此我需要使用 SQL 自己查找所有违反约束的行。不幸的是,我微薄的 SQL 技能无法胜任这项任务。

我的表有三个相关的列:entity_id、station_id 和 obs_year。对于每一行,station_id 和 obs_year 的组合应该是唯一的,我想通过使用 SQL 查询将它们刷新出来来确定是否存在违反此规则的行。

我已经尝试了以下 SQL(由this previous question建议),但它对我不起作用(我得到 ORA-00918 列的定义不明确):

SELECT
entity_id, station_id, obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable 
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

有人可以建议我做错了什么,和/或如何解决这个问题?

4

9 回答 9

42
SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
        FROM    mytable t
        )
WHERE   rn > 1
于 2010-08-17T15:21:14.003 回答
12
SELECT entity_id, station_id, obs_year
FROM mytable t1
WHERE EXISTS (SELECT 1 from mytable t2 Where
       t1.station_id = t2.station_id
       AND t1.obs_year = t2.obs_year
       AND t1.RowId <> t2.RowId)
于 2010-08-17T15:21:16.860 回答
2

重写您的查询

SELECT
t1.entity_id, t1.station_id, t1.obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable 
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

我认为模棱两可的列错误(ORA-00918)是因为您正在select输入名称同时出现在表和子查询中的列,但是您没有指定是来自dupes还是来自mytable(别名为t1)。

于 2010-08-17T15:19:02.090 回答
2

将初始选择中的 3 个字段更改为

SELECT
t1.entity_id, t1.station_id, t1.obs_year
于 2010-08-17T15:19:26.577 回答
1

你不能创建一个包含唯一约束的新表,然后逐行复制数据,忽略失败吗?

于 2010-08-17T15:19:02.857 回答
1

您需要为主选择中的列指定表。此外,假设 entity_id 是 mytable 的唯一键并且与查找重复项无关,您不应该在 dupes 子查询中对其进行分组。

尝试:

SELECT t1.entity_id, t1.station_id, t1.obs_year
FROM mytable t1
INNER JOIN (
SELECT station_id, obs_year FROM mytable 
GROUP BY station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
于 2010-08-17T16:03:57.180 回答
0
SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
        FROM    mytable t
        )
WHERE   rn > 1

Quassnoi 对大桌子最有效。我对成本进行了以下分析:

SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
WHERE EXISTS (SELECT 1 from trn_refil_book b Where
       a.dist_code = b.dist_code and a.book_date = b.book_date and a.book_no = b.book_no
       AND a.RowId <> b.RowId)
       ;

给出了 1322341 的成本

SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
INNER JOIN (
SELECT b.dist_code, b.book_date, b.book_no FROM trn_refil_book b 
GROUP BY b.dist_code, b.book_date, b.book_no HAVING COUNT(*) > 1) c 
ON 
 a.dist_code = c.dist_code and a.book_date = c.book_date and a.book_no = c.book_no
;

给出了 1271699 的成本

尽管

SELECT  dist_code, book_date, book_no
FROM    (
        SELECT  t.dist_code, t.book_date, t.book_no, ROW_NUMBER() OVER (PARTITION BY t.book_date, t.book_no
          ORDER BY t.dist_code) AS rn
        FROM    trn_refil_book t
        ) p
WHERE   p.rn > 1
;

给出了1021984的成本

该表未编入索引....

于 2013-12-03T04:29:54.970 回答
0
  SELECT entity_id, station_id, obs_year
    FROM mytable
GROUP BY entity_id, station_id, obs_year
HAVING COUNT(*) > 1

指定字段以查找 SELECT 和 GROUP BY 上的重复项。

它的工作原理是使用GROUP BY根据指定的列查找与任何其他行匹配的任何行。说我们只对出现超过 1 次的HAVING COUNT(*) > 1行感兴趣(因此是重复的)

于 2014-08-07T22:35:30.087 回答
0

我认为这里的许多解决方案都很麻烦且难以理解,因为我有一个 3 列主键约束并且需要查找重复项。所以这里有一个选项

SELECT id, name, value, COUNT(*) FROM db_name.table_name
GROUP BY id, name, value
HAVING COUNT(*) > 1
于 2019-05-02T20:23:53.923 回答