我想将我的数据框的一列保持在其原始状态,而不是对其应用任何原语,这可能吗?
问问题
201 次
1 回答
3
是的,您可以使用ignore_variables
to 参数执行此操作ft.dfs
。这是演示实体集的示例。
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
es.plot()
如果我们想为会话实体构建功能,但忽略device
变量,我们可以运行
feature_defs = ft.dfs(target_entity="sessions",
entityset=es,
agg_primitives=["count", "mode"],
trans_primitives=[],
ignore_variables={"sessions": ["device"]},
features_only=True)
feature_defs
具有以下特点
[<Feature: customer_id>,
<Feature: COUNT(transactions)>,
<Feature: MODE(transactions.product_id)>,
<Feature: customers.zip_code>,
<Feature: MODE(transactions.products.brand)>,
<Feature: customers.COUNT(sessions)>,
<Feature: customers.COUNT(transactions)>,
<Feature: customers.MODE(transactions.product_id)>]
count
这使用和原语创建特征mode
,但忽略会话实体中的设备变量。如果我们想将设备变量包含在其原始状态中,我们可以像这样将其添加回来
feature_defs += [ft.Feature(es["sessions"]["device"])]
现在,我们可以计算特征矩阵。device
现在已经结束了
fm = ft.calculate_feature_matrix(features=feature_defs, entityset=es)
fm
customer_id COUNT(transactions) MODE(transactions.product_id) customers.zip_code ... customers.COUNT(sessions) customers.COUNT(transactions) customers.MODE(transactions.product_id) device
session_id ...
1 2 16 3 13244 ... 7 93 4 desktop
2 5 10 5 60091 ... 6 79 5 mobile
3 4 15 1 60091 ... 8 109 2 mobile
4 1 25 5 60091 ... 8 126 4 mobile
5 4 11 5 60091 ... 8 109 2 mobile
6 1 15 4 60091 ... 8 126 4 tablet
7 3 15 1 13244 ... 6 93 1 tablet
8 4 18 1 60091 ... 8 109 2 tablet
9 1 15 1 60091 ... 8 126 4 desktop
10 2 15 2 13244 ... 7 93 4 tablet
11 4 15 3 60091 ... 8 109 2 mobile
12 4 10 4 60091 ... 8 109 2 desktop
13 4 12 2 60091 ... 8 109 2 mobile
14 1 12 4 60091 ... 8 126 4 tablet
15 2 8 2 13244 ... 7 93 4 desktop
16 2 10 4 13244 ... 7 93 4 desktop
17 2 13 1 13244 ... 7 93 4 tablet
18 1 12 2 60091 ... 8 126 4 desktop
19 3 17 1 13244 ... 6 93 1 desktop
20 5 15 1 60091 ... 6 79 5 desktop
21 4 18 5 60091 ... 8 109 2 desktop
22 4 10 2 60091 ... 8 109 2 desktop
23 3 11 3 13244 ... 6 93 1 desktop
24 5 14 4 60091 ... 6 79 5 tablet
25 3 16 1 13244 ... 6 93 1 desktop
26 1 16 1 60091 ... 8 126 4 tablet
27 1 15 5 60091 ... 8 126 4 mobile
28 5 18 2 60091 ... 6 79 5 mobile
29 1 16 4 60091 ... 8 126 4 mobile
30 5 14 3 60091 ... 6 79 5 desktop
31 2 18 3 13244 ... 7 93 4 mobile
32 5 8 3 60091 ... 6 79 5 mobile
33 2 13 3 13244 ... 7 93 4 mobile
34 3 18 4 13244 ... 6 93 1 desktop
35 3 16 5 13244 ... 6 93 1 mobile
作为健全性检查,如果我们不使用,这就是输出ignore_variables
feature_defs = ft.dfs(target_entity="sessions",
entityset=es,
agg_primitives=["count", "mode"],
trans_primitives=[],
features_only=True)
您可以看到该功能<Feature: customers.MODE(sessions.device)>
现在已创建
[<Feature: customer_id>,
<Feature: device>,
<Feature: COUNT(transactions)>,
<Feature: MODE(transactions.product_id)>,
<Feature: customers.zip_code>,
<Feature: MODE(transactions.products.brand)>,
<Feature: customers.COUNT(sessions)>,
<Feature: customers.MODE(sessions.device)>,
<Feature: customers.COUNT(transactions)>,
<Feature: customers.MODE(transactions.product_id)>]
于 2019-02-07T14:29:15.463 回答