我必须生成一个报告,该报告将为我提供表 A、B 和 C 中使用 Hive 存储的事件的计数总和,并且我的 S3 存储桶已按 Organization_id 分区
例如: 表 A – 记录约翰(和其他员工)上班的每一天 表 B – 记录约翰(和其他员工)在工作中拨打或接听的每个电话 表 C – 记录每个约翰(和其他员工)在工作中提交的费用
基本上我想要上个月 John (employee_id) 的 A、B 和 C 计数的总和。如果在 3 个表 A、B 或 C 中的任何一个中有记录,则每个日期应该只有一个记录(如果一个或多个表中有一个日期的记录,则将计数相加)。所以我的输出是:
Employee id
Employee Name
Date
Count
123
John
02-Jan-2016
55
123
John
12-Jan-2016
88
123
John
19-Jan-2016
103
我想出的查询是:
select adcts.employee_name, adcts.employee_id,Total_count as event_count, adcts.event_date
from
(select coalesce(Evts.employee_id,imps.employee_id,AEvts.employee_id) as employee_id
, coalesce(Evts.employee_name,imps.employee_name,AEvts.employee_name) as employee_name
, coalesce(Evts.Event_count,0) + coalesce(Imps.Impression_count,0) + coalesce (AEvts.Event_Count,0)as Total_Count
, coalesce (Evts.event_date,imps.impression_date, AEvts.event_date) as event_date
from
(select employee_id, employee_name, count(*) as Event_count,event_date
from mm_events
where organization_id = 100048
and event_date between '2016-02-01' and '2016-02-04'
group by employee_id, employee_name,event_date) Evts
full outer join
(select employee_id, employee_name, count(*) as Impression_count, impression_date
from mm_impressions
where organization_id = 100048
and impression_date between '2016-02-01' and '2016-02-04'
group by employee_id, employee_name,impression_date) Imps
on Evts.employee_id = Imps.employee_id
full outer join
(select employee_id, employee_name, count(*) as Event_count,event_date
from mm_attributed_events
where organization_id = 100048
and event_date between '2016-02-01' and '2016-02-04'
and event_type = 'click'
group by employee_id, employee_name,event_date) AEvts
on AEvts.employee_id=Evts.employee_id
) adcts
join
(select distinct c.employee_id from default.t1_meta_dmp c
where c.employee_dmp_enabled='inherits'
and c.agency_dmp_enabled = 'inherits'
and c.agency_status='true'
and c.employee_status='true'
and c.organization_id = 100048) cc
on adcts.employee_id=cc.employee_id
order by adcts.employee_id asc
我有两个问题:
1. 我有正确的查询吗?2. 因为我使用的是“完全外部联接”,所以我在同一日期获得了多个条目。有人可以提出更好的方法来实现结果吗?不同的查询可能