我在选择一些已简化为以下示例的值时遇到问题。基本上,我有一个这样的表:
CREATE TABLE sample_table
(
pk_id NUMBER,
business_id NUMBER
)
现在该表中的一些business_id 是重复的,我需要知道这些记录的pk。
假设我(进一步)建立并填写表格,如下所示:
ALTER TABLE sample_table ADD (
CONSTRAINT sample_table_PK
PRIMARY KEY
(pk_id));
create sequence sample_sequence;
create trigger sample_trigger before insert on sample_table for each row
begin
:new.pk_id := sample_sequence.nextval;
end;
insert into sample_table (business_id) values (1000);
insert into sample_table (business_id) values (1001);
insert into sample_table (business_id) values (1002);
insert into sample_table (business_id) values (1003);
insert into sample_table (business_id) values (1003);
insert into sample_table (business_id) values (1004);
现在找出哪些business_id 是重复的很容易:
SELECT business_id, COUNT (business_id)
FROM sample_table
GROUP BY business_id
HAVING COUNT (business_id) > 1;
但我不想要business_id,我想要pk_id。
我可以使用上述查询作为子查询来获取它们:
select * from sample_table where business_id in (
SELECT business_id
FROM sample_table
GROUP BY business_id
HAVING COUNT (business_id) > 1);
或使用 COUNT ( * ) OVER PARTITION BY 和子查询分解
with q as
(SELECT business_id, COUNT ( * ) OVER (PARTITION BY business_id) totalcount
FROM sample_table)
select * from q
where q.totalcount > 1
但是它们都使我的查询非常慢(此示例的查询工作正常,但是当我处理大约 500.000 行的生产数据时,性能并不是那么好)所以我想知道是否有更好的方法来做到这一点。