1

I have a table that contains a list of accounts by month along with a field that indicates activity. I want to search through to find when an account has "died", based on the following criteria:

  1. the account had consistent activity for a contiguous period of months
  2. the account had a spike of activity on a final month (spike = 200% or more of average of all previous contiguous months of activity)
  3. the month immediately following the spike of activity and the next 12 months all had 0 activity

So the table might look something like this:

ID | Date      | Activity
1  | 1/1/2010  | 2
2  | 1/1/2010  | 3.2
1  | 2/3/2010  | 3
2  | 2/3/2010  | 2.7
1  | 3/2/2010  | 8
2  | 3/2/2010  | 9
1  | 4/6/2010  | 0
2  | 4/6/2010  | 0
1  | 5/2/2010  | 0
2  | 5/2/2010  | 2

So in this case both accounts 1 and 2 have activity in months Jan - Mar. Both accounts exhibit a spike of activity in March. Both accounts have 0 activity in April. Account 2 has activity again in May, but account 1 does not. Therefore, my query should return Account 1, but not Account 2. I would want to see this as my query result:

ID | Last Date
1  | 3/2/2010 

I realize this is a complicated question and I'm not expecting anyone to write the whole query for me. The current best approach I can think of is to create a series of sub-queries and join them, but I don't even know what the subqueries would look like. For example: how do I look for a contiguous series of rows for a single ID where activity is all 0 (or all non-zero?).

My fall-back if the SQL is simply too involved is to use a brute-force search using Java where I would first find all unique IDs, and then for each unique ID iterate across the months to determine if and when the ID "died".

Once again: any help to move in the right direction is very much appreciated.

4

2 回答 2

0

用Java处理,或者部分用SQL处理,用Java完成处理是一个好办法。

我不打算解决如何定义尖峰。

我建议你从条件 3 开始。很容易找到最后一个非零值。那么这就是您要测试峰值的那个,以及峰值之前的一致数据。

SELECT out.*
FROM monthly_activity out
  LEFT OUTER JOIN monthly_activity comp
    ON out.ID = comp.ID AND out.Date < comp.Date AND comp.Activity <> 0
WHERE comp.Date IS NULL

不错,但是如果这是因为该记录是该月的最后一个记录,则您不想要结果,因此,

SELECT out.*
FROM monthly_activity out
  INNER JOIN monthly_activity comp
    ON out.ID = comp.ID AND out.Date < comp.Date AND comp.Activity == 0
GROUP BY out.ID
于 2012-10-25T19:17:13.817 回答
0

可能不是世界上最有效的代码,但我认为这可以满足您的需求:

declare @t table (AccountId int, ActivityDate date, Activity float)

insert @t 
      select 1,   '2010-01-01', 2
union select 2,   '2010-01-01', 3.2
union select 1,   '2010-02-03', 3
union select 2,   '2010-02-03', 2.7
union select 1,   '2010-03-02', 8
union select 2,   '2010-03-02', 9
union select 1,   '2010-04-06', 0
union select 2,   '2010-04-06', 0
union select 1,   '2010-05-02', 0
union select 2,   '2010-05-02', 2


select AccountId, ActivityDate LastActivityDate --, Activity
from @t a
where 
--Part 2 --select only where the activity is a peak
Activity >= isnull
(
    (
        select 2 * avg(c.Activity)
        from @t c
        where c.AccountId = 1
        and c.ActivityDate >= isnull
        (
            (
                select max(d.ActivityDate)
                from @t d
                where d.AccountId = c.AccountId
                and d.ActivityDate < c.ActivityDate
                and d.Activity = 0  
            )
            ,
            (
                select min(e.ActivityDate)
                from @t e
                where e.AccountId = c.AccountId
            )
        )
        and c.ActivityDate < a.ActivityDate
    )
    , Activity + 1 --Part 1 (i.e. if no activity before today don't include the result)
)
--Part 3
and not exists --select only dates which have had no activity for the following 12 months on the same account (assumption: count no record as no activity / also ignore current date in this assumption)
(
    select 1
    from @t b
    where a.AccountId = b.AccountId
    and b.Activity > 0
    and b.ActivityDate between dateadd(DAY, 1, a.ActivityDate) and dateadd(YEAR, 1, a.ActivityDate)
)
于 2012-10-25T19:48:03.470 回答