最后,这个问题被同事解决了。
让我们看看有多少重复:
SELECT COUNT(*) FROM _sample_table_delme_data_files ;
count
-------
12728
(1 row)
现在,我们将在源表中添加另一列来帮助我们区分相似的行:
ALTER TABLE _sample_table_delme_data_files ADD COLUMN id2 serial;
我们现在可以看到 dups:
SELECT id, id2 FROM _sample_table_delme_data_files ORDER BY id LIMIT 10;
id | id2
--------+------
198748 | 6449
198748 | 85
198801 | 166
198801 | 6530
198829 | 87
198829 | 6451
198926 | 88
198926 | 6452
199062 | 6532
199062 | 168
(10 rows)
并删除它们:
DELETE FROM _sample_table_delme_data_files
WHERE id2 IN (SELECT max(id2) FROM _sample_table_delme_data_files
GROUP BY id
HAVING COUNT(*)>1);
让我们看看它是否有效:
SELECT id FROM _sample_table_delme_data_files GROUP BY id HAVING COUNT(*)>1;
id
----
(0 rows)
删除辅助列:
ALTER TABLE _sample_table_delme_data_files DROP COLUMN id2;
ALTER TABLE
将剩余的行插入到目标表中:
INSERT INTO data_files (SELECT * FROM _sample_table_delme_data_files);
INSERT 0 6364