featuretools - 运行深度特征合成时只包括某些特征？

Question

例如，我的一个实体有两组 ID。一个是连续的（这显然是创建 EntitySet 所必需的），另一个是在与我的另一个表合并时用作外键。

这导致特征工具包括要聚合的特征集中的 ID。SUM(ID) 不是我感兴趣的功能。

有没有办法在运行深度特征合成时包含某些特征？

score 2 · Accepted Answer

调用时可以通过三种方式排除特征ft.dfs。

使用ignore_variables指定实体中不应用于创建特征的变量。它是一个字典，将实体 id 映射到要忽略的变量名称列表。
用于drop_contains删除包含此参数中列出的任何字符串的要素。
用于drop_exact删除与此参数中列出的任何字符串完全匹配的特征。

这是ft.dfs调用中所有三个的示例用法

ft.dfs(target_entity="customers"],
       ignore_variables={
           "transactions": ["amount"],
           "customers": ["age", "gender", "date_of_birth"]
       }, # ignore these variables
       drop_contains=["customers.SUM("],  # drop features that contain these strings
       drop_exact=["STD(transactions.quanity)"],  # drop features named exactly this
       ...
 )

这 3 个参数都记录在这里。

如果您获得不想要的功能，最后要考虑的是实体集中变量的变量类型。如果您看到一个 ID 变量的总和，这一定意味着 featuretools 认为 ID 变量是一个数值。如果您告诉 featuretools 它是一个 ID，它将不会对其应用数字聚合。

featuretools - 运行深度特征合成时只包括某些特征？

1 回答 1

Related

Reference