python - Featuretools 分类处理

Question

Featuretools 提供了处理分类变量的集成功能

variable_types={"product_id": ft.variable_types.Categorical} https://docs.featuretools.com/loading_data/using_entitysets.html

然而，这些应该是strings或pandas.Category类型以实现与 Featuretools 的最佳兼容性？

编辑

此外，是否需要手动指定所有列，如 https://github.com/Featuretools/predict-appointment-noshow/blob/master/Tutorial.ipynb或者它们是否会从拟合熊猫数据类型中自动推断出来

import featuretools.variable_types as vtypes
variable_types = {'gender': vtypes.Categorical,
                  'patient_id': vtypes.Categorical,
                  'age': vtypes.Ordinal,
                  'scholarship': vtypes.Boolean,
                  'hypertension': vtypes.Boolean,
                  'diabetes': vtypes.Boolean,
                  'alcoholism': vtypes.Boolean,
                  'handicap': vtypes.Boolean,
                  'no_show': vtypes.Boolean,
                  'sms_received': vtypes.Boolean}

score 3 · Accepted Answer

将数据加载到 Featuretools 时，您应该使用 Pandas Category dtype。与使用字符串相比，这将显着节省内存使用量。

加载数据时，您无需手动指定每个变量类型。如果未提供，Featuretools 将尝试从 Pandas dtype 推断它。

python - Featuretools 分类处理

编辑

1 回答 1

Related

Reference