2

我有一些记录:

+---+--------+---------------+
| | 数据 | 时间 |
+---+--------+---------------+
| 1 | 1 | 2013-04-22 16:18:07 |
| 2 | 1 | 2013-04-22 16:18:17 |
| 3 | 2 | 2013-04-22 16:18:27 |
| 4 | 2 | 2013-04-22 16:18:37 |
| 5 | 1 | 2013-04-22 16:18:47 |
| 6 | 1 | 2013-04-22 16:18:57 |
| 7 | 1 | 2013-04-22 16:19:07 |
| 8 | 3 | 2013-04-22 16:19:17 |
| 9 | 3 | 2013-04-22 16:19:27 |
| 10| 1 | 2013-04-22 16:19:37 |
| 11| 2 | 2013-04-22 16:19:47 |
| 12| 2 | 2013-04-22 16:19:57 |
| 13| 3 | 2013-04-22 16:20:07 |
| 14| 3 | 2013-04-22 16:20:17 |
+---+--------+---------------+

我怎样才能得到这些记录?:

+---+--------+---------------+
| | 数据 | 时间 |
+---+--------+---------------+
| 1 | 1 | 2013-04-22 16:18:07 |
| 3 | 2 | 2013-04-22 16:18:27 |
| 5 | 1 | 2013-04-22 16:18:47 |
| 8 | 3 | 2013-04-22 16:19:17 |
| 10| 1 | 2013-04-22 16:19:37 |
| 11| 2 | 2013-04-22 16:19:47 |
| 13| 3 | 2013-04-22 16:20:07 |
+---+--------+---------------+

我想为每个子组选择第一个条目,但如果我使用 distinct - 我有这个记录数组:

+---+--------+---------------+
| | 数据 | 时间 |
+---+--------+---------------+
| 1 | 1 | 2013-04-22 16:18:07 |
| 3 | 2 | 2013-04-22 16:18:27 |
| 8 | 3 | 2013-04-22 16:19:17 |
+---+--------+---------------+
4

3 回答 3

2

这里的问题是您需要定义您正在查看的组。不同组的“数据”值重复。

这是查找每个组的方法。为每一行分配一个按时间排序的顺序值。然后,为每个数据值分配另一个按时间排序的顺序值。当这些值是连续的时,这些值之间的差异是恒定的。

以下将这个想法用于您的数据。确定组后,此方法用于group by获取数据:

select MIN(data) as data, MIN(time) as time
from (select t.*,
             (ROW_NUMBER() over (order by time) -
              ROW_NUMBER() over (partition by data order by time
             ) as thegroup
      from t
     ) t
group by thegroup

如果您要保留更多列,则可以枚举每个组中的行以获取第一个:

select data, time
from (select t.*, ROW_NUMBER() over (partition by thegroup order by time) as seqnum
      from (select t.*,
                   (ROW_NUMBER() over (order by time) -
                    ROW_NUMBER() over (partition by data order by time
                   ) as thegroup
            from t
           ) t
      group by thegroup
     ) t
where seqnum = 1

你也可以使用 Postgres 的distinct on语法来做到这一点。

于 2013-04-22T17:43:10.837 回答
1

这是一个更简单高效的版本

SELECT 
  *
FROM 
  (
    SELECT 
      id, 
      data, 
      time, 
      lag( id, 1 ) over( partition by data ORDER BY id ) as prev_id
    FROM t 
  ) t
WHERE 
  prev_id is null 
  OR id - prev_id > 1
ORDER BY
  id

由于您需要first row从每个组中获取,我使用 PostgreSQL 窗口函数lag()来生成一个名为prev_id如下所示的列(下表仅适用于records数据所在的那些,也为其他值1创建了一个类似的表)data

+---+----------+
| id | prev_id | 
+---+----------+
| 1  | NULL    |  This row is valid as lag is NULL
| 2  | 1       | 
| 3  | 2       | 
| 5  | 3       |  This row is valid as diff is > 1 (between prevoius_id and current_id )
| 6  | 5       | 
| 7  | 6       |  
| 10 | 7       |  This row is valid as diff is > 1 (between prevoius_id and current_id )

在上述两种情况中的任何一种情况下,当lag is NULLORid-lag > 1true我认为该行start row适用于每个group

SQLFIDDLE

于 2013-04-22T18:38:33.413 回答
0

使用按数据和时间分组而不是不同的

“按数据分组”将按数据字段对行进行分组,但如果您输入“和时间”,它也会按时间对数据组进行分组

于 2013-04-22T16:31:47.150 回答