要提取与日期相关的其他特征,您需要在调用ft.dfs
.
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
features = ft.dfs(entityset=es,
target_entity="customers",
agg_primitives=["count", "sum", "mode"],
trans_primitives=["day", "hour", "weekend", "month", "year"],
features_only=True)
我使用了features_only
参数,所以这只返回特征定义。变量现在features
看起来像这样
[<Feature: zip_code>,
<Feature: COUNT(transactions)>,
<Feature: DAY(date_of_birth)>,
<Feature: WEEKEND(join_date)>,
<Feature: COUNT(sessions)>,
<Feature: WEEKEND(date_of_birth)>,
<Feature: HOUR(date_of_birth)>,
<Feature: DAY(join_date)>,
<Feature: MODE(sessions.device)>,
<Feature: SUM(transactions.amount)>,
<Feature: YEAR(join_date)>,
<Feature: HOUR(join_date)>,
<Feature: YEAR(date_of_birth)>,
<Feature: MONTH(join_date)>,
<Feature: MONTH(date_of_birth)>,
<Feature: MODE(transactions.product_id)>,
<Feature: MODE(sessions.MODE(transactions.product_id))>,
<Feature: MODE(sessions.MONTH(session_start))>,
<Feature: MODE(sessions.DAY(session_start))>,
<Feature: MODE(sessions.YEAR(session_start))>,
<Feature: MODE(sessions.HOUR(session_start))>]
Featuretools 只返回数字和分类特征,所以我们必须像这样手动添加日期时间特征
features += [ft.Feature(es["customers"]["join_date"]), ft.Feature( es["customers"]["date_of_birth"])]
现在,我们可以根据实际数据计算特征
fm = ft.calculate_feature_matrix(entityset=es, features=features)
这将返回 which 作为数据帧末尾的join_date
anddate_of_birth
zip_code COUNT(transactions) DAY(date_of_birth) WEEKEND(join_date) COUNT(sessions) WEEKEND(date_of_birth) HOUR(date_of_birth) DAY(join_date) MODE(sessions.device) SUM(transactions.amount) YEAR(join_date) HOUR(join_date) YEAR(date_of_birth) MONTH(join_date) MEAN(transactions.amount) MODE(transactions.product_id) MONTH(date_of_birth) MEAN(sessions.COUNT(transactions)) MODE(sessions.MODE(transactions.product_id)) MEAN(sessions.MEAN(transactions.amount)) MODE(sessions.MONTH(session_start)) MODE(sessions.DAY(session_start)) MEAN(sessions.SUM(transactions.amount)) MODE(sessions.YEAR(session_start)) MODE(sessions.HOUR(session_start)) SUM(sessions.MEAN(transactions.amount)) join_date date_of_birth
customer_id
1 60091 126 18 True 8 False 0 17 mobile 9025.62 2011 10 1994 4 71.631905 4 7 15.750000 4 72.774140 1 1 1128.202500 2014 6 582.193117 2011-04-17 10:48:33 1994-07-18
2 13244 93 18 True 7 False 0 15 desktop 7200.28 2012 23 1986 4 77.422366 4 8 13.285714 3 78.415122 1 1 1028.611429 2014 3 548.905851 2012-04-15 23:31:04 1986-08-18
3 13244 93 21 True 6 False 0 13 desktop 6236.62 2011 15 2003 8 67.060430 1 11 15.500000 1 67.539577 1 1 1039.436667 2014 5 405.237462 2011-08-13 15:42:34 2003-11-21
4 60091 109 15 False 8 False 0 8 mobile 8727.68 2011 20 2006 4 80.070459 2 8 13.625000 1 81.207189 1 1 1090.960000 2014 1 649.657515 2011-04-08 20:08:14 2006-08-15
5 60091 79 28 True 6 True 0 17 mobile 6349.66 2010 5 1984 7 80.375443 5 7 13.166667 3 78.705187 1 1 1058.276667 2014 0 472.231119 2010-07-17 05:27:50 1984-07-28