3

我有一个棘手的 SQL 问题。这是基于 SQL Server 2008 R2。

从 Log 表中,我必须合并具有相同消息的连续记录 ( MSG),计算合并的消息数 ( COUNT),然后删除重复的消息。这也需要在一个日期范围内完成,以便该范围之外的任何记录都不会受到影响。

为了使这更容易理解,这里是数据的一个小例子:

ID  DATE       MSG  COUNT
1   2013-08-17 mail NULL
2   2013-08-17 mail NULL
3   2013-08-17 www  NULL
4   2013-08-18 www  NULL
5   2013-08-18 www  NULL
6   2013-08-18 www  NULL
7   2013-08-18 mail NULL
8   2013-08-18 www  NULL
9   2013-08-19 mail NULL
10  2013-08-19 mail NULL
11  2013-08-20 mail NULL
12  2013-08-20 mail NULL
13  2013-08-21 www  NULL
14  2013-08-22 mail NULL
15  2013-08-22 mail NULL
16  2013-08-23 mail NULL
17  2013-08-23 mail NULL
18  2013-08-23 mail NULL

结果应如下所示:

ID  DATE       MSG  COUNT
1   2013-08-17 mail NULL
2   2013-08-17 mail NULL
3   2013-08-17 www  NULL
6   2013-08-18 www  3
7   2013-08-18 mail 1
8   2013-08-18 www  1
12  2013-08-20 mail 4
13  2013-08-21 www  1
15  2013-08-22 mail 2
16  2013-08-23 mail NULL
17  2013-08-23 mail NULL
18  2013-08-23 mail NULL

所以,基本上,查询应该

  1. 仅处理给定日期范围内的数据(在此示例中为 from 2013-08-18to 2013-08-22
  2. 根据MSG字段的文本组合连续的行
  3. 计算组合数据并在COUNT字段中设置值
  4. 删除重复记录(在此示例中,例如 ID 6 保留,但 ID 5 和 ID 4 应删除)

由于我不是 SQL 专家,因此我非常感谢任何帮助、建议或 SQL 查询。

4

3 回答 3

1

我的想法是通过 2 个查询来完成:

(i)第一个是只计算和更新记录。

(ii)第二个是删除给定日期范围的所有记录,这些记录在列上有NULL值。COUNT

编辑:我做了步骤(i),但我无法让它保留要删除的COUNT值。NULL它使用 . 更新所有行COUNT。现在你只需要DELETE正确的行。

步骤(一)

(对于 MySQL)

UPDATE tab ta JOIN 
    (SELECT date, msg, COUNT(*) AS cnt FROM tab GROUP BY date, msg) tb 
SET ta.count = tb.cnt 
WHERE ta.date = tb.date AND ta.msg = tb.msg AND 
ta.date BETWEEN 
    DATE('2013-08-18') AND DATE('2013-08-21');

PS:我使用的语法DATE是针对 MySQL 的,你可以针对 MS SQL Server 进行调整。

(对于 MS SQL 服务器)

UPDATE ta 
SET ta.count = tb.cnt 
FROM tab ta, 
     (SELECT date, msg, COUNT(*) AS cnt FROM tab GROUP BY date, msg) tb 
WHERE ta.date = tb.date AND ta.msg = tb.msg AND ta.date 
BETWEEN CAST('2013-08-18' AS DATE) AND CAST('2013-08-20' AS DATE);
于 2013-09-17T13:07:04.290 回答
1

尝试这个:

DROP TABLE #temp 
GO
select
    * 
into #temp
from (
    select '1' as id,'2013-08-17' as [date], 'mail' as msg,'NULL' as [count] union all
    select '2','2013-08-17','mail','NULL' union all
    select '3','2013-08-17','www','NULL' union all
    select '4','2013-08-18','www','NULL' union all
    select '5','2013-08-18','www','NULL' union all
    select '6','2013-08-18','www','NULL' union all
    select '7','2013-08-18','mail','NULL' union all
    select '8','2013-08-18','www','NULL' union all
    select '9','2013-08-19','mail','NULL' union all
    select '10','2013-08-19','mail','NULL' union all
    select '11','2013-08-20','mail','NULL' union all
    select '12','2013-08-20','mail','NULL' union all
    select '13','2013-08-21','www','NULL' union all
    select '14','2013-08-22','mail','NULL' union all
    select '15','2013-08-22','mail','NULL' union all
    select '16','2013-08-23','mail','NULL' union all
    select '17','2013-08-23','mail','NULL' union all
    select '18','2013-08-23','mail','NULL'
) x
GO


select 
    t.*,
    rwn
from #temp t
join (
    select 
        id, [date], [msg], [rwn] = row_number() over(partition by [date], [msg] order by id )
    from #temp
    where 1=1
        and [date] between '2013-08-18' and '2013-08-22'
) x
    on t.id=x.id
 order by 
    t.date, t.msg

只需将其修改为 UPDATE 然后删除所有 rwn>1 的行

编辑:您的数据类型可能是文本,因此您可以对错误进行排序/比较。你真的需要文字吗?它是一种大型对象数据类型(blob),可以存储几 GB 的文本。例如,尝试将其更改为 varchar(8000) ,或者如果这些确实是大消息,则 varchar(max) 也可以

于 2013-09-17T13:11:50.683 回答
1

嗨,请试试这个希望它对你有帮助,我理解的方式是你需要分组和删除重复并只保留 1。对不起我的英语

DECLARE @Table_2 TABLE (ID INT, [DATE] date, MSG Varchar(50), [COUNT] int)
Declare @fromDate as date = '2013-08-18'
Declare @toDate as date = '2013-08-22'

INSERT INTO @Table_2 (ID, [DATE], MSG, [COUNT])
SELECT     MAX(DISTINCT ID) AS ID, DATE, MSG, COUNT(DATE) AS COUNT
FROM         dbo.Table_1
where [DATE] between @fromDate and @toDate
GROUP BY DATE, MSG



UPDATE Table_1 
SET [COUNT] = T2.COUNT 

FROM Table_1 AS T1 INNER JOIN
@Table_2 AS T2
ON T1.ID = T2.ID

WHERE T1.ID = T2.ID


DELETE T1
FROM Table_1 AS T1
FULL OUTER JOIN @Table_2 AS T2 
ON T1.DATE = T2.DATE AND T1.MSG = T2.MSG 

WHERE (T1.DATE = T2.DATE AND T1.MSG = T2.MSG) AND T1.ID != T2.ID 
于 2013-09-18T03:21:33.727 回答