0

I have a database which is tracking the date a transaction takes place, in addition to the unique buyer that transaction corresponds to - now I am trying to look at reservations which resulted in a customer purchasing again at a later date. Right now, the code and sample output below shows me which customers were repeat customers by the count of their buyer_id, but I also want to be able to to see which purchases (reservations) resulted in that same customer purchasing again at a later time (not earlier in time, which could be the case by using a simple "Count").

SELECT r.id AS Reservation_id, r.created, r.buyer_id, COUNT(r.buyer_id)
FROM reservations r
GROUP BY r.buyer_id
ORDER BY Reservation_id


 Reservation_id   created             buyer_id     COUNT(r.buyer_id)
 3                2007-08-14 18:28:38   438           1
 7                2007-09-19 12:29:52   474           2
 8                2007-09-19 13:14:54   476           1
 9                2007-09-20 10:22:52   477           1
 10               2007-09-25 15:27:45   485           3
 11               2007-09-26 20:56:25   474           2
 12 .... etc

The goal is to be able to pull additional data about each reservation and then see what factors of service have an effect on a customer coming back for a repeat purchase. In the case above, buyer #474 purchased twice, but I want to be able to distinguish the first purchase (when he/she did indeed come back a purchase again, the 2nd and final purchase) from the second purchase (after which no other purchases were made by buyer #474). In this case, the goal is to have another output row that shows:

Reservation_id   created             buyer_id    COUNT(r.buyer_id) Returning
 3                2007-08-14 18:28:38   438           1              0
 7                2007-09-19 12:29:52   474           2              1
 8                2007-09-19 13:14:54   476           1              0
 9                2007-09-20 10:22:52   477           1              0
 10               2007-09-25 15:27:45   485           3              1
 11               2007-09-26 20:56:25   474           2              0
 12 .... etc

i.e., showing how customer 474's ID does not show up again after reservations_id 11. I would do this in excel but I have a huge amount of rows and excel can't handle the functions over such a large dataset.

Any help or suggestions are appreciated.

4

1 回答 1

0

评论太长了。

您正在尝试做的事情称为重复事件分析。特别是,它是称为“生存分析”的统计/数据挖掘分支的一部分。

您的方法(这在 SQL 中完全可能)导致有偏见的结果和有偏见的结论,原因很简单,第一次购买很久以前的人比昨天第一次购买的人有更大的退货机会。

我对问题的回答几乎是从不读我的一本书。但是,《Data Analysis Using SQL and Excel》这本书有两章是关于生存分析的。这些可以帮助您从实际角度理解此类事件发生时间问题。

于 2013-05-21T21:21:34.157 回答