1

I have a table with duplicate entries and the objective is to get the distinct entries based on the latest time stamp.

In my case 'serial_no' will have duplicate entries but I select unique entries based on the latest time stamp.

Below query is giving me the unique results with the latest time stamp. But my concern is I need to get the total of unique entries.

For example assume my table has 40 entries overall. With the below query I am able to get 20 unique rows based on the serial number. But the 'total' is returned as 40 instead of 20. Any help on this pls?

  SELECT 
  * 
  FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp,
      COUNT(*) OVER() as total 
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC OFFSET 0 
    LIMIT 
      10
  ) AS my_info 
 ORDER BY 
   serial_no asc

enter image description here

product_info table intially has this data  

serial_no           name         timestamp                              

11212               pulp12      2018-06-01 20:00:01             
11213               mango       2018-06-01 17:00:01             
11214               grapes      2018-06-02 04:00:01             
11215               orange      2018-06-02 07:05:30             
11212               pulp12      2018-06-03 14:00:01             
11213               mango       2018-06-03 13:00:00             



After the distict query I got all unique results based on the latest 
timestamp:

serial_no       name        timestamp                   total

11212           pulp12     2018-06-03 14:00:01            6
11213           mango      2018-06-03 13:00:00            6
11214           grapes     2018-06-02 04:00:01            6
11215           orange     2018-06-02 07:05:30            6


But total is appearing as 6 . I wanted the total to be 4 since it has 
only 4 unique entries.

I am not sure how to modify my existing query to get this desired 
result.
4

3 回答 3

2

您可以做的是将窗口函数移动到更高级别的选择语句。这是因为在应用 distinct on 和 limit 子句之前评估窗口函数。此外,您不能DISTINCT在窗口函数中包含关键字 - 它尚未实现(从 Postgres 9.6 开始)。

 SELECT 
  *,
  COUNT(*) OVER() as total -- here
 FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC
    LIMIT 
      10
  ) AS my_info

此外,此处不需要偏移量,并且再进行一次排序也是多余的。我已经删除了这些。

另一种方法是在 select 子句中包含一个计算列,但这不会像需要再次扫描表那样快。这显然是假设您的总数严格连接到您的结果集,而不是超出存储在表中的内容,而是被过滤掉了。

于 2018-06-01T21:10:19.653 回答
2

Postgres 支持COUNT(DISTINCT column_name),所以如果我理解了你的要求,使用它而不是COUNT(*)将工作,你可以放弃OVER.

于 2018-06-01T21:12:27.073 回答
0
select count(*), serial_no from product_info group by serial_no

将为您提供每个序列号的重复数量

合并该信息的最愚蠢的方法是加入子查询

  SELECT 
  * 
  FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp,
      COUNT(*) OVER() as total 
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC OFFSET 0 
    LIMIT 
      10
  ) AS my_info
  join (select count(*) as counts, serial_no from product_info group by serial_no) as X
  on X.serial_no = my_info.serial_no
 ORDER BY 
   serial_no asc
于 2018-06-01T20:03:44.707 回答