1

数据库中的 USER_DIM 表出现问题,该表通过 USER_ID 引用了该数据库中的其他表,例如 USER_ACTIVITY_FACT。

当表的原始设计没有使用实际上来自不同数据库的 EXTERNAL_ID 的正确唯一标识符时,就会出现问题。

我可以弄清楚如何选择额外的行或删除它们,但我想更新其他表中的行以指向原始 USER_ID,然后删除 USER_DIM 表中的额外行

SELECT ACTIVITY_FACT.USER_ID
FROM USER_ACTIVITY_FACT 
WHERE USER_ACTIVITY_FACT.USER_ID IN (
select USER_ID    
  FROM USER_DIM
  WHERE EXTERNAL_ID IN (SELECT ud2.EXTERNAL_ID 
  FROM  USER_DIM as ud2
  where USER_ID > ud2.USER_ID));

将这些更新为最小的 USER_ID

然后对USER_DIM执行delete语句;

DELETE      
  FROM USER_DIM   
  WHERE EXTERNAL_ID IN (SELECT ud2.EXTERNAL_ID 
  FROM  USER_DIM as ud2
  where USER_ID > ud2.USER_ID);

之后 ALTER 表在 EXTERNAL_ID 列上具有唯一索引。

此查询可能会按行更新,而不是更喜欢一次更新多行违规的额外 USER_ID。

在此先感谢您的帮助!

更新 为了澄清目标:

USER_ACTIVITY_FACT 
-------------
USER_ID
2 
3
4
5
6

USER_DIM
--------------
USER_ID  EXTERNAL_ID
2        23
3        23
4        24
5        24
6        26

..结果应该看起来像

USER_ACTIVITY_FACT 
-------------
USER_ID
2 
2
4
4
6

USER_DIM
--------------
USER_ID  EXTERNAL_ID
2        23
4        24
6        26

希望这可以帮助

4

3 回答 3

1

不确定我是否正确收到了请求,但这是我想出的。您可以使用 group by 找到每个 EXTERNAL_ID 的最小 USER_ID 并将其作为映射信息 (OLD_ID => NEW_ID) 放入临时表中。之后,您加入需要在临时表上更新的表并从旧 id 更新到新 id(加入 OLD_ID,更新到 NEW_ID)。最后,您可以像以前一样删除,或再次加入映射表。

您可以查看SQLFiddleDemo

--prepare data and insert into #mapping temp table from dim
WITH CTE1 AS 
(
    SELECT EXTERNAL_ID, MIN(USER_ID) AS NEW_USER_ID
    FROM dbo.USER_DIM
    GROUP BY EXTERNAL_ID
)
SELECT  CTE1.EXTERNAL_ID ,
        USER_ID AS OLD_USER_ID ,
        NEW_USER_ID
INTO #mapping
FROM dbo.USER_DIM
INNER JOIN CTE1 ON dbo.USER_DIM.EXTERNAL_ID = CTE1.EXTERNAL_ID;

--check your mappings
SELECT * FROM #mapping;

--update fact table based on join on mappings
UPDATE fact 
SET fact.USER_ID = src.NEW_USER_ID
FROM #mapping src
INNER JOIN dbo.USER_ACTIVITY_FACT fact ON src.OLD_USER_ID = fact.USER_ID;

--check your fact table
SELECT * FROM USER_ACTIVITY_FACT;

--delete from dim based on mappings missmatch
DELETE d
FROM dbo.USER_DIM d
INNER JOIN #mapping m ON d.USER_ID = m.OLD_USER_ID
WHERE m.NEW_USER_ID <> m.OLD_USER_ID;

--check your dim table
SELECT * FROM dbo.USER_DIM;
于 2013-04-12T21:13:27.800 回答
1

使用带有OUTPUT子句的派生表中的 UPDATE

DECLARE @delUserID TABLE(delUserID int) 

UPDATE x
SET x.USER_ID = x.NewUserID
OUTPUT DELETED.USER_ID INTO @delUserID
FROM (  
      SELECT f.USER_ID, 
             MIN(f.USER_ID) OVER(PARTITION BY u.EXTERNAL_ID) AS NewUserID             
      FROM dbo.USER_ATIVITY_FACT f JOIN dbo.USER_DIM u ON f.USER_ID = u.USER_ID
      ) x
WHERE x.USER_ID != x.NewUserID      

DELETE USER_DIM
WHERE USER_ID IN (SELECT delUserID FROM @delUserID)

SQLFiddle上的演示

于 2013-04-12T21:32:21.423 回答
0

这是我想出的。@Nenad 解决方案有效,但我必须通过 liquibase 脚本来推动它,并且不确定 SQL Server 功能是否有效。我也针对SQLFiddle进行了检查。

--prepare data and insert into #mapping temp table from dim
UPDATE USER_ACTIVITY_FACT 
SET 
    USER_ID = (SELECT 
            MIN(ud1.USER_ID)
        FROM
            USER_DIM as ud1
        WHERE
            ud1.EXTERNAL_ID = (SELECT 
                    MIN(ud2.EXTERNAL_ID)
                FROM
                    USER_DIM as ud2,
                    USER_DIM as ud3
                WHERE
                    ud2.EXTERNAL_ID = ud3.EXTERNAL_ID
                        AND ud3.USER_ID = USER_ACTIVITY_FACT.USER_ID)
); 

--delete from dim based on mappings missmatch
DELETE      
  FROM USER_DIM  
WHERE EXISTS
    (SELECT * FROM USER_DIM t1 
     WHERE t1.EXTERNAL_ID = USER_DIM.EXTERNAL_ID
 AND USER_DIM.USER_ID > t1.USER_ID);
于 2013-04-15T13:44:13.600 回答