1

我在 Qubole 中有一个代码需要将近 3 个小时才能执行。我正在寻找一些减少代码执行时间的建议。

WITH
    -- Get latest date - 10 days before as day 
d
AS (
    SELECT CAST(CONCAT (
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 1, 4),
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 6, 2),
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 9, 2)
                ) AS BIGINT) AS day,
        CAST(CONCAT (
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 1, 4),
                '-',
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 6, 2),
                '-',
                SUBSTR(CAST(DATE_ADD('day', - 10, CAST(CURRENT_TIMESTAMP AS DATE)) AS VARCHAR), 9, 2)
                ) AS DATE) AS DATE,
        'FR' AS country
    )
SELECT 'Streaming' AS TRANSACTION,
    'Spotify' AS account,
    p_day,
    access,
    COUNT(DISTINCT customer_id) AS users,
    COUNT(*) AS units
FROM temp_1
WHERE day >= (
        SELECT day
        FROM d
        )
    AND country_code = (
        SELECT country
        FROM d
        )
GROUP BY 1,
    2,
    3,
    4

UNION ALL

SELECT 'Streaming' AS TRANSACTION,
    'Deezer' AS account,
    p_day,
    CASE 
        WHEN offer_code IN ('APP', 'BAO', 'BDP', 'BDS', 'BMO', 'BMS', 'BMW', 'BPF', 'BPP', 'BPR', 'BSO', 'BWE', 'BWP', 'BWS', 'DEE', 'DEP', 'ETT', 'EXT', 'FFX', 'IOS', 'OT1', 'PBH', 'PE1', 'PE2', 'PEM', 'PLS', 'PRM', 'PSC', 'PTP', 'SDP', 'SMG', 'SPF', 'SPP', 'SPR', 'SUP', 'SWE', 'SWP', '3M', 'FAM', 'GOO', 'GOF', 'HFP', 'HFF', 'HFI')
            THEN 'premium'
        WHEN offer_code IN ('BFR', 'MFS', 'MOD', 'SMR')
            THEN 'free'
        ELSE NULL
        END AS access,
    COUNT(DISTINCT masked_consumer_id) AS users,
    SUM(units_sold_streams) AS streams
FROM temp_2
WHERE day >= (
        SELECT day
        FROM d
        )
    AND country_code = (
        SELECT country
        FROM d
        )
GROUP BY 1,
    2,
    3,
    4

UNION ALL

SELECT 'Streaming' AS TRANSACTION,
    'Apple Music' AS account,
    ingest_datestamp AS p_day,
    'premium' AS access,
    COUNT(DISTINCT anonymized_person_id) AS users,
    COUNT(*) AS streams
FROM temp_streams1
WHERE ingest_datestamp >= (
        SELECT DATE
        FROM d
        )
    AND country_code = (
        SELECT country
        FROM d
        )
GROUP BY 1,
    2,
    3,
    4
4

1 回答 1

0

这对查询性能优化没有多大帮助,但有助于简化代码。可以简化日期计算(在 Presto 上测试)

cast(DATE_FORMAT(DATE_ADD('day', -10, CURRENT_DATE),'%Y%m%d') as bigint) as day,
DATE_ADD('day', -10, CURRENT_DATE)                                       as date

为了提高性能,我建议您按日期和国家代码的数据大小对表进行分区,并将日期作为参数计算,而不是作为子查询来确保分区修剪有效。

于 2019-05-14T15:41:29.427 回答