2

I'm wondering if there is a way to automatically select the amount of past data when calculating features.

For example, I might want to predict when a customer is going to make their next purchase, so it would be good to know a count of purchases or average purchase price by different date cutoffs. e.g. Purchases in the last 12 months, last 3 months, 7 days etc.

What is the best way to approach this with featuretools?

4

1 回答 1

1

您可以使用 中的训练窗口参数创建一个仅使用一定数量历史数据的特征矩阵featuretools.dfs。设置训练窗口时,Featuretools 将使用cutoff time和之间的历史数据cutoff_time - training_window。这是文档中的示例:

window_fm, window_features = ft.dfs(entityset=es,
                                    target_entity="customers",
                                    cutoff_time=cutoff_times,
                                    cutoff_time_in_index=True,
                                    training_window="1 hour")

在确定哪些数据可以使用时,训练窗口会检查time_index列中的时间是否在训练窗口内。

于 2018-06-25T13:09:45.497 回答