0

我已经设法使用 CTE 计算客户是否在每月期间活跃,而在下一个期间(流失)不活跃。到目前为止,这已被证明是非常简单的。我曾经这样做的代码片段(对于其他四处寻找如何做到这一点的人)如下。我的dwh.marts.fact_customer_kpi表有代表客户的记录active,这意味着他/她已经花了一些钱使用服务。

with monthly_usage as (
  select
    userid as who_identifier,
    datediff(month, '1970-01-01', date) as time_period,
    date_part(mon,date) as month,
    date_part(yr,date) as year,
    CAST(
      CAST(date_part(yr,date) AS VARCHAR(4)) +
      RIGHT('0' + CAST(date_part(mon,date) AS VARCHAR(2)), 2) +
      RIGHT('0' + CAST(1 AS VARCHAR(2)), 2)
   AS DATETIME)as day
  from dwh.marts.fact_customer_kpi as k
          inner join dwh.marts.dim_user as u on u.user_id = k.userid
  where 
  kpi = 'ACTIVE' and (datediff(month, CURRENT_DATE, registration_date)*-1) > 1 group by 1,2,3,4,5 order by 1,2,3,4,5)
,

lag_lead as (
  select who_identifier,
  time_period,
  year,
  month,
  day,
    lag(time_period,1) over (partition by who_identifier order by who_identifier, time_period),
    lead(time_period,1) over (partition by who_identifier order by who_identifier, time_period)
  from monthly_usage)

,

lag_lead_with_diffs as (
  select who_identifier,
    year,
    month,
    day,
    time_period,
    lag,
    lead,
    time_period-lag lag_size,
    lead-time_period lead_size
  from lag_lead)
,

calculated as (
select time_period,
  year,
  month,
  day,
  case when lag is null then 'NEW ACTIVE'
     when lag_size = 1 then 'ACTIVE'
     when lag_size > 1 then 'REACTIVATED'
  end as this_month_value,
  case when (lead_size > 1 OR lead_size IS NULL) then 'CHURN'
     else NULL
  end as next_month_churn,
  who_identifier,
  count(who_identifier) as countIdentifier
   from lag_lead_with_diffs group by 1,2,3,4,5,6,7)

select time_period,
    day,
  this_month_value,
  who_identifier,
  next_month_churn,
  sum(countIdentifier) as countIdentifier
  from calculated  group by 1,2,3,4,5
union
  select time_period+1,
  dateadd(month,1,day),
  'CHURN',
  who_identifier,
  next_month_churn,
  countIdentifier
  from calculated where next_month_churn is not null
order by 1;

但是,现在我想知道 Redshift 中是否有一种有效的方法可以根据特定日期计算周期。例如,根据客户注册后的 7 天时间段计算上述相同值,而不是按月计算。

我的查询中所需的更改将在monthly_usage. 我试过使用- interval '7 days'但到目前为止没有成功,或者我错过了一些东西。

谁能指出我所缺少的东西(最好举个例子),或者需要进行哪些更改?

我正在使用 Amazon Redshift。

4

1 回答 1

1

您是否缺少date_trunc功能?因为感觉像。

你可以替换这个:

    CAST(
      CAST(date_part(yr,date) AS VARCHAR(4)) +
      RIGHT('0' + CAST(date_part(mon,date) AS VARCHAR(2)), 2) +
      RIGHT('0' + CAST(1 AS VARCHAR(2)), 2)
   AS DATETIME)as day

你可以这样做:

date_trunc('month', date)

然后我想用一些不错的语言对其进行参数化,并轻松换掉其他dateparts。我可能也会datediff(month, '1970-01-01', date)换掉EXTRACT

于 2017-05-25T09:05:17.820 回答