因此,我正在进行 RFM 分析,并在很多帮助下,能够将以下查询组合在一起,输出 customer_id、r 分数、f 分数、m 分数,最后是组合的 rfm 分数:
--This will first create quintiles using the ntile function
--Then factor in the conditions
--Then combine the score
--Then the substrings will seperate each score's individual points
SELECT *,
SUBSTRING(rfm_combined,1,1) AS recency_score,
SUBSTRING(rfm_combined,2,1) AS frequency_score,
SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (
SELECT
customer_id,
rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
(SELECT
customer_id,
ntile(5) over (order by last_order_date) AS rfm_recency,
ntile(5) over (order by count_order) AS rfm_frequency,
ntile(5) over (order by total_spent) AS rfm_monetary
FROM
(SELECT
customer_id,
MAX(oms_order_date) AS last_order_date,
COUNT(*) AS count_order,
SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM
l_dmw_order_report
WHERE
order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
AND line_status NOT IN ('CANCELLED', 'HOLD')
AND oms_order_date BETWEEN '2019-01-01' AND CURRENT_DATE
AND customer_id = 'US621111112234061'
GROUP BY customer_id))
ORDER BY customer_id desc)
在上面,您会注意到我强制它仅在特定的 customer_id 上输出。那是因为我想测试这个查询是否考虑了 customer_id 出现在多个 YearMonth 类别中的时间(因为他们可能在 1 月购买,然后在 2 月再次购买,然后在 11 月再次购买)。
这里的问题是,尽管查询输出了正确的分数,但它似乎只考虑了一次 customer_id,无论它是否出现在多个月内。对于这个特定的客户 ID,我看到它们出现在 2019 年 1 月、2019 年 2 月和 2019 年 11 月,所以它应该给我 3 行而不是 1 行。测试了几个小时,似乎找不到原因,但我怀疑我的分组可能是错误的。
感谢您的帮助,如果您有任何问题,请告诉我!!
最好的,
Z