postgresql - Postgres : Need distinct records count

Question

I have a table with duplicate entries and the objective is to get the distinct entries based on the latest time stamp.

In my case 'serial_no' will have duplicate entries but I select unique entries based on the latest time stamp.

Below query is giving me the unique results with the latest time stamp. But my concern is I need to get the total of unique entries.

For example assume my table has 40 entries overall. With the below query I am able to get 20 unique rows based on the serial number. But the 'total' is returned as 40 instead of 20. Any help on this pls?

  SELECT 
  * 
  FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp,
      COUNT(*) OVER() as total 
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC OFFSET 0 
    LIMIT 
      10
  ) AS my_info 
 ORDER BY 
   serial_no asc

product_info table intially has this data  

serial_no           name         timestamp                              

11212               pulp12      2018-06-01 20:00:01             
11213               mango       2018-06-01 17:00:01             
11214               grapes      2018-06-02 04:00:01             
11215               orange      2018-06-02 07:05:30             
11212               pulp12      2018-06-03 14:00:01             
11213               mango       2018-06-03 13:00:00             



After the distict query I got all unique results based on the latest 
timestamp:

serial_no       name        timestamp                   total

11212           pulp12     2018-06-03 14:00:01            6
11213           mango      2018-06-03 13:00:00            6
11214           grapes     2018-06-02 04:00:01            6
11215           orange     2018-06-02 07:05:30            6


But total is appearing as 6 . I wanted the total to be 4 since it has 
only 4 unique entries.

I am not sure how to modify my existing query to get this desired 
result.

score 2 · Accepted Answer

您可以做的是将窗口函数移动到更高级别的选择语句。这是因为在应用 distinct on 和 limit 子句之前评估窗口函数。此外，您不能DISTINCT在窗口函数中包含关键字 - 它尚未实现（从 Postgres 9.6 开始）。

 SELECT 
  *,
  COUNT(*) OVER() as total -- here
 FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC
    LIMIT 
      10
  ) AS my_info

此外，此处不需要偏移量，并且再进行一次排序也是多余的。我已经删除了这些。

另一种方法是在 select 子句中包含一个计算列，但这不会像需要再次扫描表那样快。这显然是假设您的总数严格连接到您的结果集，而不是超出存储在表中的内容，而是被过滤掉了。

score 2 · Accepted Answer

Postgres 支持COUNT(DISTINCT column_name)，所以如果我理解了你的要求，使用它而不是COUNT(*)将工作，你可以放弃OVER.

score 0 · Accepted Answer

select count(*), serial_no from product_info group by serial_no

将为您提供每个序列号的重复数量

合并该信息的最愚蠢的方法是加入子查询

  SELECT 
  * 
  FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp,
      COUNT(*) OVER() as total 
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC OFFSET 0 
    LIMIT 
      10
  ) AS my_info
  join (select count(*) as counts, serial_no from product_info group by serial_no) as X
  on X.serial_no = my_info.serial_no
 ORDER BY 
   serial_no asc

postgresql - Postgres : Need distinct records count

3 回答 3

Related

Reference