4

我是 Apache Pig 的新手,正在努力学习。COUNT(DISTINCT CASE WHEN ...)Apache Pig 中是否有等效的 SQL ?

例如,我正在尝试做这样的事情:

CREATE TABLE email_profile AS
SELECT user_id
, COUNT(DISTINCT CASE WHEN email_code = 'C' THEN message_id ELSE NULL END) AS clickthroughs
, COUNT(DISTINCT CASE WHEN email_code = 'O' THEN message_id ELSE NULL END) AS opened_messages
, COUNT(DISTINCT message_id) AS total_messages_received
FROM email_campaigns
 GROUP BY user_id;

我不能使用 a FILTER email_campaigns BY email_code = 'C',因为这会减少其他情况。有没有办法在一个嵌套FOREACH块中完成这一切?

谢谢!

编辑:

根据要求,示例数据。字段是used_idemail_codemessage_id

user1@example.com    O     111
user1@example.com    C     111
user2@example.com    O     111
user1@example.com    O     222
user2@example.com    O     333

预期输出:

user1@example.com    2    1    2
user2@example.com    2    0    2
4

1 回答 1

3

FOREACH可以在. GROUP_ used_id有关更多详细信息,请参阅我的代码中的注释。

就像是:

-- Firstly we group so the FOREACH is applied per used_id
A = GROUP email_campaigns BY used_id ;
B = FOREACH A {
        -- We need these three lines to accomplish the:
        -- DISTINCT CASE WHEN email_code = 'C' THEN message_id ELSE NULL END
        -- First, we get only cases where email_code == 'C'
        click_filt = FILTER email_campaigns BY email_code == 'C' ;
        -- Since we only want unique message_ids, we need to project it out
        click_proj = FOREACH click_filt GENERATE message_id ;
        -- Now we can find all unique message_ids for a given filter
        click_dist = DISTINCT click_proj ;

        opened_filt = FILTER email_campaigns BY email_code == 'O' ;
        opened_proj = FOREACH opened_filt GENERATE message_id ;
        opened_dist = DISTINCT opened_proj ;

        total_proj = FOREACH email_campaigns GENERATE message_id ;
        total_dist = DISTINCT total_proj ;
    GENERATE group AS used_id, COUNT(click_dist) AS clickthroughs,
                               COUNT(opened_dist) AS opened_messages,
                               COUNT(total_dist) AS total_messages_received ;
}

的输出B应该是:

(user1@example.com,1,2,2)
(user2@example.com,0,2,2)

如果您需要对正在发生的事情有任何额外的说明,请告诉我。

于 2013-10-17T20:37:56.483 回答