0

我在 Access 中有一个表,它有一个 SKU 列和它的 Sales 列。销售列有间隙,即 >=3 的空白或零。零将被视为空白并应被删除。间隙将被视为 >=3 空白或零。对于每个不同的 SKU,我想在其中找到连续范围的开始和结束以及计数(结束 - 开始 + 1)。

小例子:

SKU         SALES
==================
ABC        6504.00
ABC        3304.23
ABC        0
ABC        0
ABC        
ABC        
ABC        403.053
ABC        3493.00
ABC        3939.02
DEF        4935.24
DEF        3037.22
DEF        
DEF        
DEF        
DEF        392.042
DEF        0
DEF        0
DEF        3493.03
DEF        8644.40
DEF        643.035
DEF        5333.22

结果集:

SKU        RANGE     START     END    COUNT
ABC        1         1         2      2-1+1=2
ABC        2         7         9      9-7+1=3
DEF        1         10        11     11-10+1=2
DEF        2         13        19     19-13+1=7

然后应将此结果集连接到原始表,以消除范围计数 <=13 的任何 SKU 行。只有在其 SKU 范围中具有最大计数的 SKU 范围应保存在表/记录集中。

我正在使用 MSAccess,但任何人都可以将其演示为 Access 查询和 SQL Server 查询吗?

=================== 编辑=========================

嗨@凯文,

我终于让查询工作并给了我正确的销售周范围,尽管我现在需要一些帮助才能将其连接回原始临时表以仅提取选择性行。JFYI,在运行此查询之前,我已更新所有销售 KPI 列以将 NULL(空白)替换为零。

USE MASTER
GO

WITH Salesrows AS 
(
SELECT
    [SCOUNTRY],
    [SCHAR],
    [DESCRIPTION],
    [SALES VALUE WITH INNOVATION]=IIF([SALES VALUE WITH INNOVATION] IS NULL,0,[SALES VALUE WITH INNOVATION]),
    CONVERT(INT, SUBSTRING([WEEK], 8, 2)) Wk,
    CONVERT(INT, SUBSTRING([WEEK], 3, 4)) Yr,
    [wkno],
    ROW_NUMBER() OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY [WEEK]) RN
FROM STAGING
WHERE ([Level] = 'Item') 
)
,SalesRanges as 
(
SELECT *,        
    LAG([SALES VALUE WITH INNOVATION], 1) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L1,
    LAG([SALES VALUE WITH INNOVATION], 2) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L2,
    LEAD([SALES VALUE WITH INNOVATION], 1) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L5,
    LEAD([SALES VALUE WITH INNOVATION], 2) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L6
FROM SalesRows 
),
Clearcontents as
(
SELECT *,
    (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L1,0) = 0 AND ISNULL(L2,0) = 0  THEN 1 ELSE 0 END) RemoveMe0,
    (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L5,0) = 0 AND ISNULL(L6,0) = 0  THEN 1 ELSE 0 END) RemoveMe1,
    (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L1,0) = 0 AND L2<>0 AND ISNULL(L5,0) = 0 AND L6<>0 THEN 1 ELSE 0 END) RemoveMe2
FROM SalesRanges
),
CleanedData AS
(
SELECT *,
     ROW_NUMBER() OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY yr, RN) NewRn
FROM ClearContents
WHERE RemoveMe0 != 1 and RemoveMe1 != 1 and RemoveMe2 != 1
),
WeekGaps as 
(
SELECT *,
    (NewRn - Rn) Ref
FROM CleanedData
),
CorrectWeekPeriods as 
(
SELECT 
    [SCOUNTRY], 
    [SCHAR],
    [DESCRIPTION],
    COUNT([wkno]) AS CNTWKS,
    MIN([wkno]) AS MINWEEK,
    MAX([wkno]) AS MAXWEEK,
    REF
FROM WeekGaps
GROUP BY [SCOUNTRY],[SCHAR],[DESCRIPTION],[REF]
)
SELECT 
    C.[SCOUNTRY], 
    C.[SCHAR],
    C.[DESCRIPTION],
    CONVERT(INT, SUBSTRING(yw1.yrwk ,5,2)) WEEKS,
    C.CNTWKS, 
    yw1.yrwk AS MINWEEK, 
    yw2.yrwk AS MAXWEEK
FROM CorrectWeekPeriods AS C 
INNER JOIN yearweek AS yw1 ON C.MINWEEK = yw1.rn
INNER JOIN yearweek AS yw2 ON C.MAXWEEK = yw2.rn 
--WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) 
--AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.[SCOUNTRY] AND C.[SCHAR]=A.[SCHAR] AND C.[DESCRIPTION]=A.[DESCRIPTION]))
--AND SUBSTRING(CAST(yw1.yrwk AS VARCHAR(6)),5,2) >= 1)
--AND C.Description='0241004245' 
WHERE C.Description='0241004245'
  1. 我需要将 CTE 的哪些字段连接到暂存表字段才能仅在表中显示这些选择性期间行?

  2. 我相信这个查询可以被优化并变得更简洁。但是怎么做?

  3. 此外,如果我从上面的CorrectWeekPeriods中注释最后一个WHERE子句,并多次运行查询,我会得到不同的行数。我检查了执行计划,没有收到任何错误。

如果我只是取消注释WHERE 子句:

WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) 
AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.

或者这个:

WHERE C.Description='0241004245'

我得到了正确的最小和最大销售周范围。

  1. 另外,如果我取消注释

    WHERE C.Description='0241004245'

我得到执行计划中显示的错误:

/*
Missing Index Details from SQL_Correct Gaps.sql - ABC.master (ALPHA\SIFAR (52))
The Query Processor estimates that implementing the following index could improve the query cost by 97.7228%.
*/

/*
USE [master]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[staging] ([Level],[Description])
INCLUDE ([Week],[Sales Value with Innovation],[sCountry],[sChar],[wkno])
GO
*/

但是,如果我保留最后一个 WHERE 子句的注释,我不会收到此错误。顺便说一句,我已经创建了上述索引,所以不知道为什么它要求我再次创建相同的索引。为什么会发生这种情况?

此外,最后几个注释代码是我试图创建但无法编写正确代码的规则。这是规则:

  1. 如果有 2 个或更多 SKU 销售周范围,则选择最大的一个(如果从 2011 年的第 1 周开始,则更好)。
  2. 排除 >52 的任何范围,使它们达到 <=52。
  3. 如果所有 SKU 销售周范围 >13 且 <=52,则只保留最大的一个(如果从 2011 年的第 1 周开始,则更好)。
  4. 排除任何范围 <=13。

希望有人可以指导我正确的方向(特别是我的主要观点 1 加入到 Staging 表以提取适当的 SKU 销售周范围)。

编辑...我只是再次取消注释最后一个 WHERE 子句:

WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) 
AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.[SCOUNTRY] AND C.[SCHAR]=A.[SCHAR] AND C.[DESCRIPTION]=A.[DESCRIPTION]))
AND SUBSTRING(CAST(yw1.yrwk AS VARCHAR(6)),5,2) >= 1

并查看了执行计划。它在 SORT & HASH 上显示警告。警告信息是:

Operator used tempdb to spill data during execution with spill level 1

每次执行查询时,我都会得到不同的行数。该查询也需要大约 1 分钟的时间来执行。我认为它与yearweek表的联接有某种关系,但不知道如何解决这个问题。

非常感激任何的帮助。

嗨@凯文库克,

这是表定义:

USE [master]
GO

/****** Object:  Table [dbo].[staging]    Script Date: 8/6/2014 11:27:29 PM ******/
DROP TABLE [dbo].[staging]
GO

/****** Object:  Table [dbo].[staging]    Script Date: 8/6/2014 11:27:29 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[staging](
    [Level] [varchar](5) NULL,
    [Week] [varchar](9) NULL,
    [Category] [varchar](50) NULL,
    [Manufacturer] [varchar](50) NULL,
    [Brand] [varchar](50) NULL,
    [Description] [varchar](100) NULL,
    [EAN] [varchar](100) NULL,
    [Sales Value with Innovation] [float] NULL,
    [Sales Units with Innovation] [float] NULL,
    [Price Per Item] [float] NULL,
    [Importance Value w Innovation] [float] NULL,
    [Importance Units w Innovation] [float] NULL,
    [Numeric Distribution] [float] NULL,
    [Weighted Distribution] [float] NULL,
    [Average Number of Item] [float] NULL,
    [Value] [float] NULL,
    [Volume] [float] NULL,
    [Units] [float] NULL,
    [Sales Value New Manufacturer] [float] NULL,
    [Sales Value New Brand] [float] NULL,
    [Sales Value New Line Extension] [float] NULL,
    [Sales Value New Packaging] [float] NULL,
    [Sales Value New Size] [float] NULL,
    [Sales Value New Product Form] [float] NULL,
    [Sales Value New Style Type] [float] NULL,
    [Sales Value New Flavour Fragr] [float] NULL,
    [Sales Value New Claim] [float] NULL,
    [Sales Units New Manufacturer] [float] NULL,
    [Sales Units New Brand] [float] NULL,
    [Sales Units New Line Extension] [float] NULL,
    [Sales Units New Packaging] [float] NULL,
    [Sales Units New Size] [float] NULL,
    [Sales Units New Product Form] [float] NULL,
    [Sales Units New Style Type] [float] NULL,
    [Sales Units New Flavour Fragr] [float] NULL,
    [Sales Units New Claim] [float] NULL,
    [filename] [nvarchar](260) NULL,
    [importdate] [datetime] NULL CONSTRAINT [DF_staging_importdate]  DEFAULT (getdate()),
    [sCountry] [varchar](50) NULL,
    [sChar] [varchar](50) NULL,
    [yr] [int] NULL,
    [wk] [int] NULL,
    [wkno] [int] NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO
4

1 回答 1

1

这适用于 SQL Server 2012,要为 2008+ 更改它,您必须在 SaleRanges 表中对 SaleRows 进行几次自联接,以处理 LAG 函数的目的。以下是一些示例数据:

DECLARE @SalesTape TABLE
(   
    SKU VARCHAR(10),
    SALES DECIMAL(19,3),
    YEARWEEK VARCHAR(10)
)

INSERT INTO @SalesTape
VALUES
('ABC', 6504.00, 'W 2011 01'),
('ABC', 3304.23, 'W 2011 02'),
('ABC', 0, 'W 2011 03'),
('ABC', 0, 'W 2011 04'),
('ABC', null, 'W 2011 05'),
('ABC', null, 'W 2011 06'),
('ABC', 403.053, 'W 2011 07'),
('ABC', 3493.00, 'W 2011 08'),
('ABC', 3939.02, 'W 2011 09'),
('DEF', 4935.24, 'W 2011 10'),
('DEF', 3037.22, 'W 2011 11'),
('DEF', null, 'W 2011 12'),
('DEF', null, 'W 2011 13'),
('DEF', null, 'W 2011 14'),
('DEF', 392.042, 'W 2011 15'),
('DEF', 0, 'W 2011 16'),
('DEF', 0, 'W 2011 17'),
('DEF', 3493.03, 'W 2011 18'),
('DEF', 8644.40, 'W 2011 19'),
('DEF', 643.035, 'W 2011 20'),
('DEF', 5333.22, 'W 2011 21');

我的第一个 CTE 只是设置了一些行号,如果为空,则将销售额设置为 0。

;WITH SaleRows AS
(
    SELECT
        SKU,
        ISNULL(SALES, 0.0) SALES,
        CONVERT(INT, SUBSTRING(YEARWEEK, 8, 2)) Wk,
        CONVERT(INT, SUBSTRING(YEARWEEK, 3, 4)) Yr,
        ROW_NUMBER() OVER (ORDER BY YEARWEEK) RN
    FROM @SalesTape
),

第二个 CTE 建立在第一个 CTE 的基础上,查看前 2 行并将销售值放在 CTE 的列中

SaleRanges AS
(
    SELECT 
        SaleRows.SKU,
        SaleRows.SALES,
        SaleRows.Wk,
        SaleRows.Yr,
        SaleRows.RN,
        LAG(SALES, 2) OVER (ORDER BY RN) L2,
        LAG(SALES, 1) OVER (ORDER BY RN) L1
    FROM SaleRows 
),

现在,如果我的行和前 2 行都是 0.0,则将该行标记为删除。(生成期间的break),我们将生成最新清理数据的新行号以供以后使用。

ClearContent AS
(
    SELECT *, 
        CASE WHEN L1 = 0.0 AND L2 = 0.0 AND ISNULL(SALES, 0.00) = 0.0  THEN 1 ELSE 0 END RemoveMe
    FROM SaleRanges
),
CleanedData AS
(
    SELECT 
        *, 
        ROW_NUMBER() OVER (PARTITION BY SKU ORDER BY RN) NewRn
    FROM ClearContent
    WHERE RemoveMe != 1
)

删除无效行后,我们将对周与行偏移量进行一些数学运算,并生成一个逻辑周期参考。

SELECT 
    SKU,
    SALES,
    Wk,
    Yr,
    (WK - NewRn) Ref
FROM CleanedData
WHERE SALES != 0.0

这是输出:

SKU SALES   Wk  Yr  Ref
ABC 6504.000    1   2011    0
ABC 3304.230    2   2011    0
ABC 403.053 7   2011    2
ABC 3493.000    8   2011    2
ABC 3939.020    9   2011    2
DEF 4935.240    10  2011    9
DEF 3037.220    11  2011    9
DEF 392.042 15  2011    10
DEF 3493.030    18  2011    10
DEF 8644.400    19  2011    10
DEF 643.035 20  2011    10
DEF 5333.220    21  2011    10

ref 显示了组,因此您只需要获取每个 ref 的最小和最大 WK 即可找到第一条和最后一条记录。您可能可以清理并简化它,但我想展示这些步骤。希望这可以帮助。

于 2014-07-25T14:55:10.747 回答