0

I'm using PostGIS to process some complex land use data. I have several cases were there were exact duplicate polygons created. I'd like to delete these duplicates, and I am currently using the following self-join SQL to remove the duplicates:

delete from landusetable where objectid in 
(select max(x.objectid) from landusetable x JOIN landusetable y ON 
ST_Equals(x.shape, y.shape) WHERE x.objectid <> y.objectid group by x.shape);

This works fine to remove the duplicate with the higher objectid value, however it only removes the highest objectid. If there are 3 or more duplicate polygons, I need to run this statement multiple times until the delete statement affects 0 rows, then I know I've removed all of the duplicates.

So, using a PL/pgSQL function or other control structure, how can I run the statement above multiple times until I receive "DELETE 0", then quit? I looked through the documentation, but I couldn't find how to receive the number of affected rows from the previous query using PL/pgSQL.

Any assistance you could provide would be greatly appreciated!

4

1 回答 1

2

您可以row_number()使用子查询将其合并到您的查询中:

delete from landusetable
    where objectid in (select x.objectid
                       from (select x.objectid,
                                    ROW_NUMBER() over (partition by x.shape order by objectId) as seqnum
                             from landusetable x JOIN
                                  landusetable y
                                  ON ST_Equals(x.shape, y.shape)
                             WHERE x.objectid <> y.objectid
                            ) xy
                       where seqnum > 1
                      )

当然,如果您愿意,也可以将子查询放入 CTE。

在这种情况下,使用“标准”SQL 会产生更简单的查询。此版本使用where exists而不是in

delete from landusetable
    where exists (select 1
                  from landusetable lut2
                  where ST_Equals(lut2.shape, landusetable.shape) and
                        lut2.objectid > landusetable.objectid
                 )
于 2013-06-03T16:13:39.457 回答