0

我有一个由于其他原因没有设置任何 pk 的 oracle 表。它有 5 列,我希望能够删除重复的记录(如果 5 列值相同,则它们是重复的)。我想出了这个 SQL,但看起来这并没有提取重复值:

SELECT DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT
FROM table_name
GROUP BY DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT

HAVING COUNT(*) > 1

样本记录:

DATE_TIME                   SITE                                                                        RESPONSE_TIME AVAIL_PERCENT AGENT
20-Apr-13 04.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    8.2610  100.00  45693
20-Apr-13 10.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.2900  100.00  45693
24-Apr-13 07.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.7300  100.00  45693
24-Apr-13 03.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.7180  100.00  45693
08-May-13 06.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.5970  100.00  45693
20-May-13 01.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.7910  100.00  45693
25-Apr-13 01.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.3400  100.00  45693
08-May-13 05.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    2.4410  100.00  45693
09-May-13 01.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]            45693
21-May-13 06.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.5480  100.00  45693
23-Apr-13 02.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    10.7070 100.00  45693
26-Apr-13 09.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    4.0070  100.00  45693
26-Apr-13 03.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.9350  100.00  45693
22-May-13 12.52.00.00 PM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    4.1760  100.00  45693
23-Apr-13 02.53.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.9500  100.00  45693
23-Apr-13 03.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.0480  100.00  45693
23-Apr-13 04.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.7600  100.00  45693

有任何想法吗?

4

3 回答 3

1

您可以将 rowid 作为伪主键引用,并运行删除行的查询,例如:

delete from
  my_table
where
  rowid not in (
    select   min(rowid)
    from     my_table
    group by column_1,
             column_2,
             column_3,
             etc)

column_1 等是定义行唯一性的列集。

对于具有大量重复的非常大的数据集,可能有更好的性能选项,但这是一种通常足够的快速方法。

于 2013-05-29T13:27:22.333 回答
0

您是否打算创建主键?您可以为您的异常创建一个表,Oracle 会将违反主键的记录放在该表中。如果存在违规行为,则不会创建主键本身,但您可以在之后分析不良数据。=)

create table tb1 
(field1 number, field2 varchar2(100));

--good data
insert into tb1 values (1, 'a');
insert into tb1 values (1, 'b');
insert into tb1 values (1, 'c');
insert into tb1 values (2, 'a');
insert into tb1 values (2, 'b');
insert into tb1 values (2, 'c');
-- bad data
insert into tb1 values (3, 'a');
insert into tb1 values (3, 'a');
commit;

-- a table for exceptions
create table tbl_exceptions (row_id rowid,
                             owner varchar2(30),
                             table_name varchar2(30),
                             constraint varchar2(30));

-- the primary key
-- if it fails, you have repeated registers
alter table tb1 add constraint pk1 primary key (field1, field2)
exceptions into tbl_exceptions;

-- bad data will be here
-- please notice its 'ROW_ID' from the second table
select tb1.*
from  tb1,
      tbl_exceptions 
where tb1.rowid = tbl_exceptions.row_id;
于 2013-05-29T16:51:30.830 回答
0

当您在 Oracle 上时,您可以尝试以下方法来删除重复项:

DELETE my_table WHERE ROWID IN
(
  SELECT ROWID FROM
  (
    SELECT 
    DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT, ROWID, 
    ROW_NUMBER() OVER (PARTITION BY 
      DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT ORDER BY DATE_TIME) ITM_IDX
    FROM my_table
  )
  WHERE ITM_IDX > 1
);
于 2013-05-29T13:55:50.147 回答