0

我试图弄清楚我想做的查询在 SQL 中是否完全可行或可行,或者我是否需要收集原始数据并在我的应用程序中处理它。

我的架构如下所示:

applications
================
id INT

application_steps
=================
id INT
application_id INT
step_id INT
activated_at DATE
completed_at DATE

steps
=====
id INT
step_type_id INT

理想情况下,这些数据位于application_steps

| id | application_id | step_id | activated_at | completed_at |
| 1  | 1              | 1       | 2013-01-01   | 2013-01-02   |
| 2  | 1              | 2       | 2013-01-02   | 2013-01-02   |
| 3  | 1              | 3       | 2013-01-02   | 2013-01-10   |
| 4  | 1              | 4       | 2013-01-10   | 2013-01-11   |
| 5  | 2              | 1       | 2013-02-02   | 2013-02-02   |
| 6  | 2              | 2       | 2013-02-02   | 2013-02-07   |
| 7  | 2              | 4       | 2013-02-09   | 2013-02-11   |

我想得到这个结果:

| application_id | step_1_days | step_2_days | step_3_days | step_4_days |
| 1              | 1           | 0           | 8           | 1           |
| 2              | 0           | 5           | NULL        | 2           |

请注意,实际上我要查看的步骤和应用程序更多。

如您所见,和之间存在多方关系。给定步骤也可能不用于特定应用程序。我想获得每一步所花费的时间(使用),都在一行中(列名无关紧要)。这是可能吗?applicationsapplication_stepsDATEDIFF(completed_at, activated_at)

第二个问题:为了使事情更复杂一点,我还需要一个辅助查询,它application_stepssteps特定的step_type_id. 假设第一部分是可能的,我该如何扩展它以有效过滤?

注意:这里的效率是关键 - 这是一份年度报告,相当于大约 2500 个applications,其中 70 个不同的steps和 44,000个application_steps在生产中(不是很多数据,但在考虑连接时可能很多)。

4

1 回答 1

1

这应该是一个基本的“透视”聚合:

select id,
       max(case when step_id = 1 then datediff(completed_at, activated_at) end) as step_1_days,
       max(case when step_id = 2 then datediff(completed_at, activated_at) end) as step_2_days,
       max(case when step_id = 3 then datediff(completed_at, activated_at) end) as step_3_days,
       max(case when step_id = 4 then datediff(completed_at, activated_at) end) as step_4_days
from application_steps s
group by id;

您必须对所有 70 个步骤重复此操作。

仅对特定类型的步骤执行此操作:

select application_id,
       max(case when step_id = 1 then datediff(completed_at, activated_at) end) as step_1_days,
       max(case when step_id = 2 then datediff(completed_at, activated_at) end) as step_2_days,
       max(case when step_id = 3 then datediff(completed_at, activated_at) end) as step_3_days,
       max(case when step_id = 4 then datediff(completed_at, activated_at) end) as step_4_days
from application_steps s join
     steps
     on s.step_id = steps.id and
        steps.step_type_id = XXX
group by application_id;
于 2013-09-04T01:32:10.610 回答