sql - 根据两列的独特性选择行

Question

假设我们有下表

orderId  productId   orderDate              amount    
1        2           2017-01-01 20:00:00    10 
1        2           2017-01-01 20:00:01    10 
1        3           2017-01-01 20:30:10    5 
1        4           2017-01-01 22:31:10    1

其中前 2 行已知是重复的（例如错误软件的结果），因为orderId + productId必须形成唯一键

我想删除这种类型的重复项。如何以最有效的方式做到这一点？

如果没有 orderDate 一秒的差异，我们可以使用

SELECT DISTINCT * FROM `table`

不同的是，可以使用 groupby：

SELECT `orderId`,`productId`,MIN(`orderDate`),MIN(`amount`)
FROM table
GROUP BY `orderId`,`productCode`

如果有很多列，我发现后一个命令很累。还有哪些其他选择？

更新：我正在使用Snowflake。

score 2 · Accepted Answer

如果您的 dbms 支持ROW_NUMBER窗口函数，那么

select * from 
(
select row_number()Over(Partition by orderId,productId order by orderDate asc) as rn,*
From yourtable 
)a
Where Rn = 1

score 0 · Accepted Answer

您可以使用NOT EXISTS排除具有更好匹配的记录：

select * from mytable
where not exists
(
  select *
  from mytable other
  where other.orderid   = mytable.orderid
    and other.productid = mytable.productid
    and other.orderdate < mytable.orderdate
);

score 0 · Accepted Answer

这看起来好像您想在orderdate具有 commonorderid和的记录中获取具有最小值的记录productid。这可以用 SQL 表示如下：

select * from mytable t where t.orderdate = 
  (select min(t2.orderdate)
   from mytable t2
   where t2.orderid = t.orderid 
     and t2.productid = t.productid);

请注意，此查询无法消除列、和 ; 中的精确orderid重复productid项orderdate。但这实际上并没有被要求。

sql - 根据两列的独特性选择行

3 回答 3

Related

Reference