我的表有 5100 万行数据,由 50 万个产品组成。我需要提取有关 500k 产品的信息,例如附加代码,但目前这需要 30 多分钟才能运行。
尝试了几次不同的迭代,但只能看到良好的性能,限制了每个子选择和主选择中的产品列表
表格片段:
PNUM | EFFECTIVE_DATE | STAGE | ORG_ID | CURRENT_FLAG
-- | -- | -- | -- | --
2A1245 | 1999-10-01 | 07 | W6 | N
2A1245 | 2006-01-01 | 07 | U4 | N
2A1245 | 2007-11-21 | 07 | U4 | N
2A1245 | 2008-03-23 | 07 | KF | N
2A1245 | 2008-11-23 | 07 | KF | N
2A1245 | 2009-02-25 | 07 | KF | N
2A1245 | 2015-03-19 | 07 | U5 | N
2A1245 | 2015-04-14 | 07 | U6 | N
2A1245 | 2015-04-17 | 07 | U6 | N
2A1245 | 2015-05-01 | 07 | U6 | N
2A1245 | 2017-09-26 | 08 | 8X | N
2A1245 | 2019-02-20 | 08 | 8X | N
2A1245 | 2019-03-18 | 08 | 8X | N
2A1245 | 2019-04-24 | 08 | 8X | N
2A1245 | 2019-04-29 | 08 | 8X | N
2A1245 | 2019-05-11 | 08 | 8X | N
2A1245 | 2019-05-15 | 08 | 8X | N
2A1245 | 2019-06-05 | 08 | 1Z | N
2A1245 | 2019-06-08 | 09 | W1E | N
2A1245 | 2019-06-11 | 09 | W1E | N
2A1245 | 2019-08-19 | 09 | EBI | N
2A1245 | 2019-09-03 | 09 | EBI | Y
SELECT a.PNUM, c.STAGE, MIN(a.EFFECTIVE_DATE) AS NEW_DATE, c.STAGE_CHANGE
FROM D_PRODUCT a
LEFT JOIN (SELECT x.PNUM, x.STAGE FROM D_PRODUCT x
WHERE CURRENT_FLAG = 'Y' ) b ON b.PNUM = a.PNUM
LEFT JOIN (SELECT y.PNUM, y.STAGE, MIN(y.EFFECTIVE_DATE) AS STAGE_CHANGE
FROM D_PRODUCT y GROUP BY y.PNUM, y.STAGE) c
ON b.PNUM= c.PNUM AND b.STAGE = C.STAGE
GROUP BY a.PNUM, c.STAGE, c.STAGE_CHANGE
输出:
PNUM | STAGE | NEW_DATE | STAGE_CHANGE
-- | -- | -- | --
2A1245 | 09 | 1999-10-01 | 2019-06-08
当前需要几秒钟才能运行,每个子选择和主选择中限制为单个产品或PNUM,但在不限制时30分钟左右会超时