我有一个包含时间变量以及数值和分类变量的数据集。有一个ITEMID 列,每个ITEMID 有2 到12 行数据。
使用开始日期和交易日期等列,各种数字和分类列。给定 ITEMID 的所有行的开始日期相同,而每行的交易日期不同。
# creating and entity set
entity_set = ft.EntitySet(id = 'rem_dur')
# adding a dataframe
entity_set.entity_from_dataframe(entity_id = 'enh', dataframe = dataset, index = 'unique_id'
,,variable_types = {'Start_Date': ft.variable_types.DatetimeTimeIndex}))
#unique_id is just row number from 1 to number of rows in dataset
entity_set.normalize_entity(base_entity_id='enh', new_entity_id= 'categorical_vars', index = 'ITEMID',
additional_variables = ['cat_var_1', 'cat_var_2'])
###cutoff date
cutoff_df = dataset[["unique_id", "trans_date"]]
cutoff_df["trans_date"] = pd.to_datetime(cutoff_df["trans_date"])
##feature engg
feature_matrix_2, feature_names_2 = ft.dfs(entityset=entity_set
,target_entity = 'enh'
,max_depth = 2
,verbose = 1
,ignore_entities = ['categorical_vars']
,ignore_variables =ignore_features_dict
,dask_kwargs={'cluster': cluster}
It's unable to generate any time series features. It's returning just all the features except the ones which are ignored.