3

我想计算在几周范围内具有负值的 2 个或多个连续周期间的数量。

例子:

Week | Value
201301 | 10
201302 | -5 <--| both weeks have negative values and are consecutive
201303 | -6 <--| 

 Week | Value
201301 | 10
201302 | -5 
201303 | 7
201304 | -2 <-- negative but not consecutive to the last negative value in 201302 

 Week | Value
201301 | 10
201302 | -5 
201303 | -7
201304 | -2 <-- 1st group of negative and consecutive values 
201305 | 0
201306 | -12
201307 | -8 <-- 2nd group of negative and consecutive values 

除了使用游标和重置变量并按顺序检查每一行之外,还有更好的方法吗?

这是我设置的一些 SQL 来尝试测试:

IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestOne') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestOne
IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestTwo') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestTwo

CREATE TABLE #ConsecutiveNegativeWeekTestOne
(
     [Week] INT NOT NULL
     ,[Value] DECIMAL(18,6) NOT NULL
)

-- I have a condition where I expect to see at least 2 consecutive weeks with negative values
-- TRUE : Week 201328 & 201329 are both negative.
INSERT INTO #ConsecutiveNegativeWeekTestOne
VALUES
(201327, 5)
,(201328,-11)
,(201329,-18)
,(201330, 25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, 59)
,(201336, 0)
,(201337, 0)

SELECT * FROM #ConsecutiveNegativeWeekTestOne
WHERE Value < 0
ORDER BY [Week] ASC


CREATE TABLE #ConsecutiveNegativeWeekTestTwo
(
     [Week] INT NOT NULL
     ,[Value] DECIMAL(18,6) NOT NULL
)

-- FALSE: The negative weeks are not consecutive
INSERT INTO #ConsecutiveNegativeWeekTestTwo
VALUES

(201327, 5)
,(201328,-11)
,(201329,20)
,(201330, -25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, -15)
,(201336, 0)
,(201337, 0)

SELECT * FROM #ConsecutiveNegativeWeekTestTwo
WHERE Value < 0
ORDER BY [Week] ASC

我的 SQL 小提琴也在这里: http ://sqlfiddle.com/#!3/ef54f/2

4

3 回答 3

3

首先,请您分享计算周数的公式,或提供每周的真实日期,或确定任何特定年份是否有 52 或 53 周的某种方法?一旦你这样做了,我就可以让我的查询正确地跳过丢失的数据并跨越年份界限。

现在来查询:这可以在没有 . 的情况下完成JOIN,这取决于存在的确切索引,与任何使用JOINs. 再说一次,它可能不会。这也更难理解,因此如果其他解决方案表现得足够好(尤其是存在正确的索引时)可能不值得。

模拟一个PREORDER BY窗口函数(尊重差距,忽略年份边界):

WITH Calcs AS (
   SELECT
      Grp =
         [Week] -- comment out to ignore gaps and gain year boundaries
         -- Row_Number() OVER (ORDER BY [Week]) -- swap with previous line
         - Row_Number() OVER
            (PARTITION BY (SELECT 1 WHERE Value < 0) ORDER BY [Week]),
      *
   FROM dbo.ConsecutiveNegativeWeekTestOne
)
SELECT
   [Week] = Min([Week])
   -- NumWeeks = Count(*) -- if you want the count
FROM Calcs C
WHERE Value < 0
GROUP BY C.Grp
HAVING Count(*) >= 2
;

查看 SQL Fiddle 上的实时演示(第一个查询)

另一种方式,模拟LAGLEAD使用 aCROSS JOIN和聚合(尊重差距,忽略年份边界):

WITH Groups AS (
   SELECT
      Grp = T.[Week] + X.Num,
      *
   FROM
      dbo.ConsecutiveNegativeWeekTestOne T
      CROSS JOIN (VALUES (-1), (0), (1)) X (Num)
)
SELECT
   [Week] = Min(C.[Week])
   -- Value = Min(C.Value)
FROM
   Groups G
   OUTER APPLY (SELECT G.* WHERE G.Num = 0) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
   Min(G.[Week]) = Min(C.[Week])
   AND Max(G.[Week]) > Min(C.[Week])
;

查看 SQL Fiddle 上的实时演示(第二次查询)

而且,我原来的第二个查询,但简化(忽略差距,处理年份边界):

WITH Groups AS (
   SELECT
      Grp = (Row_Number() OVER (ORDER BY T.[Week]) + X.Num) / 3,
      *
   FROM
      dbo.ConsecutiveNegativeWeekTestOne T
      CROSS JOIN (VALUES (0), (2), (4)) X (Num)
)
SELECT
   [Week] = Min(C.[Week])
   -- Value = Min(C.Value)
FROM
   Groups G
   OUTER APPLY (SELECT G.* WHERE G.Num = 2) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
   Min(G.[Week]) = Min(C.[Week])
   AND Max(G.[Week]) > Min(C.[Week])
;

注意:这些的执行计划可能被评为比其他查询更昂贵,但只有 1 个表访问而不是 2 或 3 个,虽然 CPU 可能更高,但它仍然相当低。

注意:我最初并没有注意每组负值只产生一行,所以我产生了这个查询,只需要 2 次表访问(考虑间隙,忽略年份边界):

SELECT
   T1.[Week]
FROM
   dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
   Value < 0
   AND EXISTS (
      SELECT *
      FROM dbo.ConsecutiveNegativeWeekTestOne T2
      WHERE
         T2.Value < 0
         AND T2.[Week] IN (T1.[Week] - 1, T1.[Week] + 1)
   )
;

查看 SQL Fiddle 上的实时演示(第 3 个查询)

但是,我现在已将其修改为按要求执行,仅显示每个开始日期(尊重差距,忽略年份边界):

SELECT
   T1.[Week]
FROM
   dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
   Value < 0
   AND EXISTS (
      SELECT *
      FROM
         dbo.ConsecutiveNegativeWeekTestOne T2
      WHERE
         T2.Value < 0
         AND T1.[Week] - 1 <= T2.[Week]
         AND T1.[Week] + 1 >= T2.[Week]
         AND T1.[Week] <> T2.[Week]
      HAVING
         Min(T2.[Week]) > T1.[Week]
   )
;

查看 SQL Fiddle 上的实时演示(第 3 个查询)

最后,只是为了好玩,这是一个使用LEADand的 SQL Server 2012 及更高版本LAG

WITH Weeks AS (
   SELECT
      PrevValue = Lag(Value, 1, 0) OVER (ORDER BY [Week]),
      SubsValue = Lead(Value, 1, 0) OVER (ORDER BY [Week]),
      PrevWeek = Lag(Week, 1, 0) OVER (ORDER BY [Week]),
      SubsWeek = Lead(Week, 1, 0) OVER (ORDER BY [Week]),
      *
   FROM
     dbo.ConsecutiveNegativeWeekTestOne
)
SELECT @Week = [Week]
FROM Weeks W
WHERE
   (
      [Week] - 1 > PrevWeek
      OR PrevValue >= 0
   )
   AND Value < 0
   AND SubsValue < 0
   AND [Week] + 1 = SubsWeek
;

查看 SQL Fiddle 的现场演示(第 4 个查询)

我不确定我这样做是不是最好的方法,因为我没有使用太多,但它仍然有效。

您应该对呈现给您的各种查询进行一些性能测试,并选择最好的一个,考虑到代码应该按顺序:

  1. 正确的
  2. 清除
  3. 简洁的
  4. 快速地

看到我的一些解决方案不是很清楚,其他足够快和足够简洁的解决方案可能会在您自己的生产代码中使用哪个解决方案的竞争中胜出。但是……也许不是!也许有人会喜欢看到这些技术,即使这次它们不能按原样使用。

所以让我们做一些测试,看看这一切的真相是什么!这是一些测试设置脚本。它将在您自己的服务器上生成与在我的服务器上相同的数据:

IF Object_ID('dbo.ConsecutiveNegativeWeekTestOne', 'U') IS NOT NULL DROP TABLE dbo.ConsecutiveNegativeWeekTestOne;
GO
CREATE TABLE dbo.ConsecutiveNegativeWeekTestOne (
   [Week] int NOT NULL CONSTRAINT PK_ConsecutiveNegativeWeekTestOne PRIMARY KEY CLUSTERED,
   [Value] decimal(18,6) NOT NULL
);

SET NOCOUNT ON;

DECLARE
   @f float = Rand(5.1415926535897932384626433832795028842),
   @Dt datetime = '17530101',
   @Week int;

WHILE @Dt <= '20140106' BEGIN
   INSERT dbo.ConsecutiveNegativeWeekTestOne
   SELECT
      Format(@Dt, 'yyyy') + Right('0' + Convert(varchar(11), DateDiff(day, DateAdd(year, DateDiff(year, 0, @Dt), 0), @Dt) / 7 + 1), 2),
      Rand() * 151 - 76
   ;
   SET @Dt = DateAdd(day, 7, @Dt);
END;

这将生成 13,620 周,从 175301 到 201401。我修改了所有查询以选择Week值而不是计数,其格式SELECT @Week = Expression ...使测试不受将行返回给客户端的影响。

我只测试了考虑差距、非年份边界处理的版本。

结果

             Query  Duration  CPU    Reads
------------------  --------  -----  ------
    ErikE-Preorder   27        31       40
       ErikE-CROSS   29        31       40
     ErikE-Join-IN   -------Awful---------
ErikE-Join-Revised   46        47    15069
    ErikE-Lead-Lag  104       109       40
              jods   12        16      120
  Transact Charlie   12        16      120

结论

  1. 非 JOIN 版本的减少读取量不足以保证它们增加的复杂性。

  2. 桌子是如此之小,以至于性能几乎无关紧要。261 年的周数是微不足道的,因此即使查询不佳,正常的业务操作也不会出现任何性能问题。

  3. 我用一个索引进行了测试Week(这是非常合理的),用一个 seek 做两个单独JOIN的 s 远远优于任何尝试一口气获取相关相关数据的设备。查理和乔德斯在他们的评论中脱颖而出。

  4. 该数据不足以暴露查询在 CPU 和持续时间方面的实际差异。上述值具有代表性,尽管有时 31 毫秒为 16 毫秒,而 16 毫秒为 0 毫秒。由于分辨率约为 15 毫秒,这并不能告诉我们太多。

  5. 我棘手的查询技术确实表现得更好。在性能关键的情况下,它们可能是值得的。但这不是其中之一。

  6. 领先和落后可能并不总是获胜。查找值上索引的存在可能决定了这一点。即使按值的顺序不是连续的,仍然可以根据特定顺序提取先前/下一个值的能力可能是这些功能的一个很好的用例。

于 2013-06-13T23:31:19.013 回答
1

您可以使用 EXISTS 的组合。

假设你只想知道组(连续周的系列都是负面的)

--找到可能的开始周

;WITH starts as (
    SELECT [Week]
    FROM #ConsecutiveNegativeWeekTestOne AS s
    WHERE s.[Value] < 0
      AND NOT EXISTS (
        SELECT 1
        FROM #ConsecutiveNegativeWeekTestOne AS p
        WHERE p.[Week] = s.[Week] - 1
          AND p.[Value] < 0
        )
    )
SELECT COUNT(*)
FROM
    Starts AS s
    WHERE EXISTS (
        SELECT 1
        FROM #ConsecutiveNegativeWeekTestOne AS n
        WHERE n.[Week] = s.[Week] + 1
          AND n.[Value] < 0
        )

如果您在 Week 上有索引,则此查询甚至应该是中等效率的。

于 2013-06-13T21:33:58.267 回答
1

您可以将 LEAD 和 LAG 替换为自联接。

计数的想法基本上是计算负序列的开始,而不是试图考虑每一行。

SELECT COUNT(*)
FROM ConsecutiveNegativeWeekTestOne W
LEFT OUTER JOIN ConsecutiveNegativeWeekTestOne Prev
  ON W.week = Prev.week + 1
INNER JOIN ConsecutiveNegativeWeekTestOne Next
  ON W.week = Next.week - 1
WHERE W.value < 0 
  AND (Prev.value IS NULL OR Prev.value > 0)
  AND Next.value < 0

请注意,我只是做了“week + 1”,这在年份变化时不起作用。

于 2013-06-13T21:45:49.877 回答