0

我尝试了以下 URL 中提到的 Featuretools 示例:https ://docs.featuretools.com/index.html

客户数据框具有以下数据:

In [4]: customers_df Out[4]: customer_id zip_code join_date date_of_birth 0 1 60091 2011-04-17 10:48:33 1994-07-18 1 2 13244 2012-04-15 23:31:04 1986-08-18

在为数据中的每个创建特征矩阵后customer,大约创建了 73 个特征,但特征/列join_datedate_of_birth没有保留在feature_matrix_customers.

询问:

1)是否可以选择保留功能/列join_datedate_of_birthfeature_matrix_customers

2) Featuretools DFS 不会time从和中提取join_date或创建任何特征。有没有办法让小时、分钟、秒的特征类似于和特征列hoursminssecsyearmonthdate

4

1 回答 1

1

要提取与日期相关的其他特征,您需要在调用ft.dfs.

import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)

features = ft.dfs(entityset=es,
                  target_entity="customers",
                  agg_primitives=["count", "sum", "mode"],
                  trans_primitives=["day", "hour", "weekend", "month", "year"],
                  features_only=True)

我使用了features_only参数,所以这只返回特征定义。变量现在features看起来像这样

[<Feature: zip_code>,
 <Feature: COUNT(transactions)>,
 <Feature: DAY(date_of_birth)>,
 <Feature: WEEKEND(join_date)>,
 <Feature: COUNT(sessions)>,
 <Feature: WEEKEND(date_of_birth)>,
 <Feature: HOUR(date_of_birth)>,
 <Feature: DAY(join_date)>,
 <Feature: MODE(sessions.device)>,
 <Feature: SUM(transactions.amount)>,
 <Feature: YEAR(join_date)>,
 <Feature: HOUR(join_date)>,
 <Feature: YEAR(date_of_birth)>,
 <Feature: MONTH(join_date)>,
 <Feature: MONTH(date_of_birth)>,
 <Feature: MODE(transactions.product_id)>,
 <Feature: MODE(sessions.MODE(transactions.product_id))>,
 <Feature: MODE(sessions.MONTH(session_start))>,
 <Feature: MODE(sessions.DAY(session_start))>,
 <Feature: MODE(sessions.YEAR(session_start))>,
 <Feature: MODE(sessions.HOUR(session_start))>]

Featuretools 只返回数字和分类特征,所以我们必须像这样手动添加日期时间特征

features += [ft.Feature(es["customers"]["join_date"]), ft.Feature( es["customers"]["date_of_birth"])]

现在,我们可以根据实际数据计算特征

fm = ft.calculate_feature_matrix(entityset=es, features=features)

这将返回 which 作为数据帧末尾的join_dateanddate_of_birth

            zip_code  COUNT(transactions)  DAY(date_of_birth)  WEEKEND(join_date)  COUNT(sessions)  WEEKEND(date_of_birth)  HOUR(date_of_birth)  DAY(join_date) MODE(sessions.device)  SUM(transactions.amount)  YEAR(join_date)  HOUR(join_date)  YEAR(date_of_birth)  MONTH(join_date)  MEAN(transactions.amount)  MODE(transactions.product_id)  MONTH(date_of_birth)  MEAN(sessions.COUNT(transactions))  MODE(sessions.MODE(transactions.product_id))  MEAN(sessions.MEAN(transactions.amount))  MODE(sessions.MONTH(session_start))  MODE(sessions.DAY(session_start))  MEAN(sessions.SUM(transactions.amount))  MODE(sessions.YEAR(session_start))  MODE(sessions.HOUR(session_start))  SUM(sessions.MEAN(transactions.amount))           join_date date_of_birth
customer_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
1              60091                  126                  18                True                8                   False                    0              17                mobile                   9025.62             2011               10                 1994                 4                  71.631905                              4                     7                           15.750000                                             4                                 72.774140                                    1                                  1                              1128.202500                                2014                                   6                               582.193117 2011-04-17 10:48:33    1994-07-18
2              13244                   93                  18                True                7                   False                    0              15               desktop                   7200.28             2012               23                 1986                 4                  77.422366                              4                     8                           13.285714                                             3                                 78.415122                                    1                                  1                              1028.611429                                2014                                   3                               548.905851 2012-04-15 23:31:04    1986-08-18
3              13244                   93                  21                True                6                   False                    0              13               desktop                   6236.62             2011               15                 2003                 8                  67.060430                              1                    11                           15.500000                                             1                                 67.539577                                    1                                  1                              1039.436667                                2014                                   5                               405.237462 2011-08-13 15:42:34    2003-11-21
4              60091                  109                  15               False                8                   False                    0               8                mobile                   8727.68             2011               20                 2006                 4                  80.070459                              2                     8                           13.625000                                             1                                 81.207189                                    1                                  1                              1090.960000                                2014                                   1                               649.657515 2011-04-08 20:08:14    2006-08-15
5              60091                   79                  28                True                6                    True                    0              17                mobile                   6349.66             2010                5                 1984                 7                  80.375443                              5                     7                           13.166667                                             3                                 78.705187                                    1                                  1                              1058.276667                                2014                                   0                               472.231119 2010-07-17 05:27:50    1984-07-28
于 2018-10-29T22:07:22.250 回答