5

给定 2 行或更多行选择合并,其中之一被标识为模板行。其他行应将其数据合并到模板具有的任何空值列中。

示例数据:

Id  Name     Address          City          State   Active  Email             Date
1   Acme1    NULL             NULL          NULL    NULL    blah@yada.com     3/1/2011
2   Acme1    1234 Abc Rd      Springfield   OR      0       blah@gmail.com    1/12/2012
3   Acme2    NULL             NULL          NULL    1       blah@yahoo.com    4/19/2012

假设用户选择了 ID 为 1 的行作为模板行,ID 为 2 和 3 的行将合并到第 1 行,然后删除。行 ID 1 中的任何空值列都应填充(如果存在)最新的(请参阅日期列)非空值,并且行 ID 1 中已经存在的非空值将保持原样。对上述数据的查询结果应该是这样的:

Id  Name     Address          City          State   Active  Email             Date
1   Acme1    1234 Abc Road    Springfield   OR      1       blah@yada.com     3/1/2011

请注意,Active 值为 1,而不是 0,因为行 ID 3 具有最近的日期。

PS 另外,如果不事先明确定义/知道所有列名是什么,有什么方法可以做到这一点?我正在使用的实际表有大量列,并且一直在添加新列。有没有办法查找表中的所有列名,然后使用该子查询或临时表来完成这项工作?

4

4 回答 4

2

You might do it by ordering rows first by template flag, then by date desc. Template row should always be the last one. Each row is assigned a number in that order. Using max() we are finding fist occupied cell (in descending order of numbers). Then we select columns from rows matching those maximums.

; with rows as (
    select test.*,
  -- Template row must be last - how do you decide which one is template row?
  -- In this case template row is the one with id = 1
    row_number() over (order by case when id = 1 then 1 else 0 end,
                       date) rn
    from test
  -- Your list of rows to merge goes here
  -- where id in ( ... )
),
-- Finding first occupied row per column
positions as (
  select
    max (case when Name is not null then rn else 0 end) NamePosition,
    max (case when Address is not null then rn else 0 end) AddressPosition,
    max (case when City is not null then rn else 0 end) CityPosition,
    max (case when State is not null then rn else 0 end) StatePosition,
    max (case when Active is not null then rn else 0 end) ActivePosition,
    max (case when Email is not null then rn else 0 end) EmailPosition,
    max (case when Date is not null then rn else 0 end) DatePosition
  from rows
)
-- Finally join this columns in one row
select 
  (select Name from rows cross join Positions where rn = NamePosition) name,
  (select Address from rows cross join Positions where rn = AddressPosition) Address,
  (select City from rows cross join Positions where rn = CityPosition) City,
  (select State from rows cross join Positions where rn = StatePosition) State,
  (select Active from rows cross join Positions where rn = ActivePosition) Active,
  (select Email from rows cross join Positions where rn = EmailPosition) Email,
  (select Date from rows cross join Positions where rn = DatePosition) Date
from test
-- Any id will suffice, or even DISTINCT
where id = 1

You might check it at Sql Fiddle.

EDIT:

Cross joins in last section might actually be inner joins on rows.rn = xxxPosition. It works this way, but change to inner join would be an improvement.

于 2012-04-19T22:36:18.657 回答
1

It's not so complicated.

At first.. DECLARE @templateID INT = 1 ..so you can remember which row is treated as template..

Now find latest NOT NULL values (exclude template row). The easiest way is to use TOP 1 subqueries for each column:

SELECT
(SELECT TOP 1 Name FROM DataTab WHERE Name IS NOT NULL AND NOT ID = @templateID ORDER BY Date DESC) AS LatestName,
(SELECT TOP 1 Address FROM DataTab WHERE Address IS NOT NULL AND NOT ID = @templateID ORDER BY Date DESC) AS AddressName
-- add more columns here

Wrap above into CTE (Common Table Expression) so you have nice input for your UDPATE..

WITH Latest_CTE (CTE_LatestName, CTE_AddressName) -- add more columns here; I like CTE prefix to distinguish source columns from target columns..
AS
-- Define the CTE query.
(
    SELECT
    (SELECT TOP 1 Name FROM DataTab WHERE Name IS NOT NULL AND NOT ID = @templateID ORDER BY Date DESC) AS LatestName,
    (SELECT TOP 1 Address FROM DataTab WHERE Address IS NOT NULL AND NOT ID = @templateID ORDER BY Date DESC) AS AddressName
    -- add more columns here
)
UPDATE
<update statement here (below)>

Now, do smart UPDATE of your template row using ISNULL - it will act as conditional update - update only if target column is null

WITH
<common expression statement here (above)>
UPDATE DataTab
SET 
Name = ISNULL(Name, CTE_LatestName), -- if Name is null then set Name to CTE_LatestName else keep Name as Name
Address = ISNULL(Address, CTE_LatestAddress)
-- add more columns here..
WHERE ID = @templateID

And the last task is delete rows other then template row..

DELETE FROM DataTab WHERE NOT ID = @templateID

Clear?

于 2012-04-19T22:40:55.770 回答
1

对于动态列,您需要使用动态 SQL 编写解决方案。

您可以查询 sys.columns 和 sys.tables 以获取所需的列列表,然后您希望为每个空列向后循环一次,找到该列的第一个非空行并更新该列的输出行。一旦你在循环中达到 0,你就有一个完整的行,然后你可以向用户显示。

于 2012-04-19T23:25:29.687 回答
1

I should pay attention to posting dates. In any case, here's a solution using dynamic SQL to build out an update statement. It should give you something to build from, anyway.

There's some extra code in there to validate the results along the way, but I tried to comment in a way that made that non-vital code apparent.

CREATE TABLE 
dbo.Dummy 
    (
    [ID] int ,
    [Name] varchar(30),
    [Address] varchar(40) null,
    [City]  varchar(30) NULL,
    [State] varchar(2) NULL,
    [Active] tinyint NULL,
    [Email] varchar(30) NULL,
    [Date] date NULL
    );
--
INSERT dbo.Dummy
VALUES
(
    1, 'Acme1', NULL, NULL, NULL, NULL, 'blah@yada.com', '3/1/2011'
)
,
(
    2, 'Acme1', '1234 Abc Rd', 'Springfield', 'OR', 0, 'blah@gmail.com', '1/12/2012'
)
,
(
    3, 'Acme2', NULL, NULL, NULL, 1, 'blah@yahoo.com', '4/19/2012'
);
DECLARE 
    @TableName nvarchar(128) = 'Dummy',
    @TemplateID int = 1,
    @SetStmtList nvarchar(max) = '',
    @LoopCounter int = 0,
    @ColumnCount int = 0,
    @SQL nvarchar(max) = ''
    ;
--
--Create a table to hold the column names
DECLARE     
    @ColumnList table 
        (
        ColumnID tinyint IDENTITY,
        ColumnName nvarchar(128)
        );
--
--Get the column names
INSERT @ColumnList
(
    ColumnName
)
    SELECT
        c.name
    FROM
        sys.columns AS c
        JOIN
        sys.tables AS t
            ON
                t.object_id = c.object_id
    WHERE
        t.name = @TableName;
--
--Create loop boundaries to build out the SQL statement
SELECT
    @ColumnCount = MAX( l.ColumnID ),
    @LoopCounter = MIN (l.ColumnID )
FROM
    @ColumnList AS l;
--
--Loop over the column names
WHILE @LoopCounter <= @ColumnCount
BEGIN
    --Dynamically construct SET statements for each column except ID (See the WHERE clause)
    SELECT 
        @SetStmtList = @SetStmtList + ',' + l.ColumnName + ' =COALESCE(' + l.ColumnName + ', (SELECT TOP 1 ' + l.ColumnName + ' FROM ' + @TableName + ' WHERE ' + l.ColumnName + ' IS NOT NULL AND ID <> ' + CAST(@TemplateID AS NVARCHAR(MAX )) + ' ORDER BY Date DESC)) '
    FROM 
        @ColumnList AS l
    WHERE 
        l.ColumnID = @LoopCounter
        AND
        l.ColumnName <> 'ID';
--
    SELECT
        @LoopCounter = @LoopCounter + 1;
--
END;

--TESTING - Validate the initial table values
SELECT * FROM dbo.Dummy ;
--
--Get rid of the leading common in the SetStmtList
SET @SetStmtList = SUBSTRING( @SetStmtList, 2, LEN( @SetStmtList ) - 1 );
--Build out the rest of the UPDATE statement
SET @SQL = 'UPDATE ' + @TableName  + ' SET ' + @SetStmtList + ' WHERE ID = ' + CAST(@TemplateID AS NVARCHAR(MAX ))
--Then execute the update
EXEC sys.sp_executesql
    @SQL;
--
--TESTING - Validate the updated table values
SELECT * FROM dbo.Dummy ;
--
--Build out the DELETE statement
SET @SQL = 'DELETE FROM ' + @TableName + ' WHERE ID <> ' + CAST(@TemplateID AS NVARCHAR(MAX ))
--Execute the DELETE
EXEC sys.sp_executesql
    @SQL;
--
--TESTING - Validate the final table values
SELECT * FROM dbo.Dummy; 
--
DROP TABLE dbo.Dummy;
于 2016-01-26T01:18:24.797 回答