2

这让我头疼!:P

我有一张assignments桌子,我想根据他们的任务计算成员的持续时间。在其简化形式中,这将是相对直截了当的。

-------------------------------------------------------------------------
| id    | member_id | unit_id   | start_date    | end_date  |
-------------------------------------------------------------------------
| 1 | 2     | 23        | 2013-01-01    | 2013-02-01    |
-------------------------------------------------------------------------
| 2 | 2     | 25        | 2013-02-01    | 2013-03-01    |
-------------------------------------------------------------------------
| 3 | 2     | 27        | 2013-03-01    | NULL      |
-------------------------------------------------------------------------

这只是做一个on和SUM()的问题。问题是成员有可能同时进行任务。DATEDIFF()start_dateend_date

-------------------------------------------------------------------------
| id    | member_id | unit_id   | start_date    | end_date  |
-------------------------------------------------------------------------
| 1 | 2     | 23        | 2013-01-01    | 2013-02-01    |
-------------------------------------------------------------------------
| 2 | 2     | 25        | 2013-02-01    | 2013-03-01    |
-------------------------------------------------------------------------
| 3 | 2     | 30        | 2013-02-15    | 2013-03-01    |*
-------------------------------------------------------------------------
| 4 | 2     | 27        | 2013-03-01    | NULL      |
-------------------------------------------------------------------------

现在我必须以某种方式意识到 #3 与 #2 发生在同一时间,所以我不应该将它添加到SUM().

更进一步,如果成员的持续时间存在差距怎么办?

-------------------------------------------------------------------------
| id    | member_id | unit_id   | start_date    | end_date  |
-------------------------------------------------------------------------
| 1 | 2     | 23        | 2013-01-01    | 2013-02-01    |
-------------------------------------------------------------------------
| 2 | 2     | 25        | 2013-02-01    | 2013-02-05    |*
-------------------------------------------------------------------------
| 3 | 2     | 30        | 2013-02-15    | 2013-03-01    |*
-------------------------------------------------------------------------
| 4 | 2     | 27        | 2013-03-01    | NULL      |
-------------------------------------------------------------------------

另外,NULL表示“当前”,因此CURDATE().

有任何想法吗?

4

2 回答 2

1

这是想法。将每条记录分成两条,以获得作业开始和结束的日期列表。然后确定在给定日期有多少作业处于活动状态 - 基本上为每个开始添加“1”,为每个结束添加“-1”并获取累积总和。

接下来,您需要在进行最终聚合之前确定下一个日期何时获取期间。

第一部分由这个查询处理:

select member_id, thedate,
       @sumstart := if(@prevmemberid = memberid, @sumstart + isstart, isstart) as sumstart,
       @prevmemberid := memberid
from (select member_id, start_date as thedate, 1 as isstart
      from assignments
      union all
      select member_id, end_date, -1 as isstart
      from assignments
      order by member_id, thedate
     ) a cross join
     (select @sumstart := 0, @prevmemberid := NULL) const;

其余的则使用更多变量:

select member_id,
       sum(case when sumstart > 0 then datediff(nextdate, thedate) end) as daysactive
from (select member_id, thedate, sumstart,
         if(@prevmemberid = memberid, @nextdate, NULL) as nextdate,
         @prevmemberid := memberid,
         @nextdate = thedate
      from (select member_id, thedate,
                   @sumstart := if(@prevmemberid = memberid, @sumstart + isstart, isstart) as sumstart,
                   @prevmemberid := memberid
            from (select member_id, start_date as thedate, 1 as isstart
                  from assignments
                  union all
                  select member_id, coalesce(end_date, CURDATE()), -1 as isstart
                  from assignments
                  order by member_id, thedate
                 ) a cross join
                 (select @sumstart := 0, @prevmemberid := NULL) const;
           ) a cross join
           (select @nextmemberid := NULL, @nextdate := NULL) const
       order by member_id, thedate desc;
      ) a
group by member_id;

我不喜欢以这种方式使用变量,因为 MySQL 不保证给定select. 但是,在实践中,它们会按照写入的顺序进行评估(此查询依赖于该顺序)。尽管这可以在没有变量的情况下编写,没有with语句、窗口函数,甚至没有在from子句中获取子查询的视图,但生成的 SQL 会难看。

于 2013-09-02T19:23:34.807 回答
0

我认为在代码中而不是在 SQL 中过滤掉重叠的分配更容易。您可以检索某个 member_id 的所有分配,按 start_date 排序:

select * from assignments where member_id='2' order by start_date asc

然后,您可以遍历这些分配并过滤掉重叠的分配。如果 A 在 B 开始之前结束或 B 在 A 开始之前结束,则两个分配 A 和 B 不重叠。

因为我们根据开始日期对结果进行排序,所以我们可以放心地忽略第二种情况:B 永远不会在 A 之前开始,因此它不能在 A 开始之前结束。然后我们得到类似的东西:

for i=0..assignments.length
    for j=i+1..assignments.length
        if (assignments[j].start_date < assignments[i].end_date)
            assignments[j] = null; // it overlaps -> get rid of it

然后遍历分配并对非空分配的持续时间求和。这应该很容易

于 2013-09-02T19:29:52.837 回答