1

我有一些巨大的值和日期表,我想使用运行长度编码对其进行压缩。(对我来说)最明显的方法是选择所有不同的值组合,以及最小和最大日期。这样做的问题是它会错过任何映射停止然后重新开始的实例。

Id | Value1 | Value2 | Value3 |  DataDate
------------------------------------------
01 |   1    |   2    |   3    | 2000-01-01
01 |   1    |   2    |   3    | 2000-01-02
01 |   1    |   2    |   3    | 2000-01-03
01 |   1    |   2    |   3    | 2000-01-04
01 |   A    |   B    |   C    | 2000-01-05
01 |   A    |   B    |   C    | 2000-01-06
01 |   1    |   2    |   3    | 2000-01-07

将以这种方式编码为

Id | Value1 | Value2 | Value3 |  FromDate |  ToDate
-----------------------------------------------------
01 |   1    |   2    |    3   | 2000-01-01| 2000-01-07
01 |   A    |   B    |    C   | 2000-01-05| 2000-01-06

这显然是错误的。

我想要的是一个查询,它将返回每组值存在的每组连续日期。

或者,如果我向后看这个屁股,任何其他建议都将不胜感激。

4

3 回答 3

2

尝试这个:

DECLARE @MyTable TABLE (
    Id INT,
    Value1 VARCHAR(10), 
    Value2 VARCHAR(10),
    Value3 VARCHAR(10), 
    DataDate DATE
);

INSERT @MyTable 
SELECT 01, '1', ' 2', '3', '2000-01-01' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-02' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-03' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-04' UNION ALL
SELECT 01, 'A', ' B', 'C', '2000-01-05' UNION ALL
SELECT 01, 'A', ' B', 'C', '2000-01-06' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-07'

SELECT  Id, Value1, Value2, Value3,
        MIN(DataDate) AS FromDate, MAX(DataDate) AS ToDate
FROM (
    SELECT  x.Id, x.Value1, x.Value2, x.Value3, 
            x.DataDate,
            GroupNum = 
                DATEDIFF(DAY, 0, x.DataDate) -
                ROW_NUMBER() OVER(PARTITION BY x.Id, x.Value1, x.Value2, x.Value3 ORDER BY x.DataDate)
    FROM    @MyTable x
) y
GROUP BY Id, Value1, Value2, Value3, GroupNum

结果:

Id Value1 Value2 Value3 FromDate   ToDate
-- ------ ------ ------ ---------- ----------
1  1       2     3      2000-01-01 2000-01-04
1  1       2     3      2000-01-07 2000-01-07
1  A       B     C      2000-01-05 2000-01-06
于 2014-07-06T08:44:22.537 回答
0

您可能想要使用窗口函数。尝试这样的事情:

select 
    id, value1, value2, value3, 
    from_date=update_date, 
    to_date=lead(update_date) over (partition by id order by update_date)
from (
    select 
    t.*
    ,is_changed=
     case when 
        value1 <> lag(value1) over (partition by id order by update_date) or 
          (lag(value1) over (partition by id order by update_date) is null and value1 is not null) or
        value2 <> lag(value2) over (partition by id order by update_date) or 
          (lag(value2) over (partition by id order by update_date) is null and value2 is not null) or
        value3 <> lag(value3) over (partition by id order by update_date) or 
          (lag(value3) over (partition by id order by update_date) is null and value3 is not null) 
     then 1 else 0 end
    from test t
) t2
where is_changed = 1
order by id, update_date

请注意,此查询依赖于LAG()函数和另外两件事:

  • 每个“值”列的单独测试;如果你有很多列要测试,你可以考虑创建一个哈希值来简化相等检查
  • >= from_date“to_date”与下一条记录的“from_date”相同,这意味着您可能需要使用和< to_date使运行长度互斥来测试值

请注意,我在测试中使用了以下示例数据:

create table test(id int, value1 varchar(3), value2 varchar(3), value3 varchar(3), update_date datetime)

insert into test values
(1, 'A', 'B', 'C', '1/1/2014'),
(1, 'A', 'B', 'C', '2/1/2014'),
(1, 'X', 'Y', 'Z', '3/1/2014'),
(1, 'A', 'B', 'C', '4/1/2014'),
(2, 'D', 'E', 'F', '1/1/2014'),
(2, 'D', 'E', 'F', '6/1/2014')

祝你好运!

于 2014-07-06T02:07:58.983 回答
0

尝试这个:

SELECT  Id, Value1, Value2, Value3, MIN(DataDate) AS FromDate, MAX(DataDate) AS ToDate
FROM YourTable
GROUP BY Id, Value1, Value2, Value3
于 2014-07-06T03:49:35.050 回答