0

I have a Postgres table that describes relationships between entities, this table is populated by a process which I cannot modify. This is an example of that table:

+-----+-----+
| e1  | e2  |
|-----+-----|
|  A  |  B  |
|  C  |  D  |
|  D  |  C  |
| ... | ... |
+-----+-----+

I want to write a SQL query that will remove all unecessary relationships from the table, for example the relationship [D, C] is redundant as it's already defined by [C, D].

I have a query that deletes using a self join but this removes everything to do with the relationship, e.g.:

DELETE FROM foo USING foo b WHERE foo.e2 = b.e1 AND foo.e1 = b.e2;

Results in:

+-----+-----+
| e1  | e2  |
|-----+-----|
|  A  |  B  |
| ... | ... |
+-----+-----+

However, I need a query that will leave me with one of the relationships, it doesn't matter which relationship remains, either [C, D] or [D, C] but not both.

I feel like there is a simple solution here but it's escaping me.

4

2 回答 2

2

一个通用的解决方案是使用始终唯一的伪列ctid

DELETE FROM foo USING foo b WHERE foo.e2 = b.e1 AND foo.e1 = b.e2
    AND foo.ctid > b.ctid;

顺便说一句,它保留物理位置最接近表的第一个数据页的元组。

于 2013-09-12T17:21:35.010 回答
1

假设一个精确的重复行受到约束,对于给定的关系,总是最多有两行:在您的示例中为 (C,D) 和 (D,C)。相同的约束还意味着两列具有不同的值:对 (C,C) 可能是合法的,但不能重复。

假设所涉及的数据类型具有合理的定义>,您可以添加一个条件,即要删除的行是第一列>第二列的行,而另一列保持不变。

在您的示例查询中,这意味着添加AND foo.e1 > foo.e2.

于 2013-09-12T17:34:25.880 回答