1

我的环境是:

Operating system version.... Windows-10-10.0.17134-SP0
Python version is........... 3.6.5
pandas version is........... 0.23.0
numpy version is............ 1.14.3
Featuretools................ 0.3.0

我的熊猫数据框看起来像:

df
    index  BoxRatio    Thrust  Velocity  OnBalRun  vwapGain
0      1  0.324000  0.615000  1.525000  3.618000  0.416000
1      2  0.938249  0.366377  2.402230  6.393223  2.667106
2      3  0.317000 -0.281000  0.979000  1.489000  0.506000
3      4  0.289000 -0.433000  0.796000  2.081000  0.536000
4      5  1.551115 -0.103734  0.731682  1.752156  0.667016

我尝试了以下方法:

  es = ft.EntitySet('Pattern')
  es.entity_from_dataframe(dataframe=df,
                           entity_id='my_id',
                           index='index')
  def log10(column):
    return np.log10(column)

  Log10 = make_trans_primitive(function=log10,
                               input_types=[Numeric],
                               return_type=Numeric)

  from featuretools.primitives import (Count, Sum, Mean, Median, Std, Min, Max, Multiply)

  feature_matrix, feature_names = ft.dfs(entityset=es, 
                                         target_entity='my_id',
                                         trans_primitives=[Log10])
  print('feature_names:\n')
  for item in feature_names:
    print('  ' + item)

这给出了以下内容:

feature_names:
<Feature:    + BoxRatio>
<Feature:    + Thrust>
<Feature:    + Velocity>
<Feature:    + OnBalRun>
<Feature:    + vwapGain>
<Feature:    + LOG10(BoxRatio)>
<Feature:    + LOG10(Thrust)>
<Feature:    + LOG10(Velocity)>
<Feature:    + LOG10(OnBalRun)>
<Feature:    + LOG10(vwapGain)>

到目前为止一切顺利......现在如果我添加“Min”原语,我得到:

Traceback (most recent call last):
  File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 112, in <module>
    Main()
  File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 95, in Main
    trans_primitives=[Log10, Min])
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\dfs.py", line 184, in dfs
    features = dfs_object.build_features(verbose=verbose)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 218, in build_features
    all_features, max_depth=self.max_depth)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 365, in _run_dfs
    all_features, entity, max_depth=max_depth)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 514, in _build_transform_features
    new_f = trans_prim(*matching_input)
TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'

我希望看到每个列特征的最小值(就像 Log10 原语一样)。当然我可以定义自己的 Min 原语,但我希望有一个简单的解决方案。

查尔斯

4

1 回答 1

2

这里的问题是 Min 是一个聚合原语,而 Log 是一个变换原语。

聚合原语将相关实例作为输入并输出单个值。它们应用于实体集中的父子关系。例如,Min 接受一个值列表并返回该列表的最小值。

转换原语将来自实体的一个或多个变量作为输入,并为该实体输出一个新变量。它们应用于单个实体。例如,log 接受一个值列表,并返回一个与输入中每个项目的 log 长度相同的列表。

您可以在有关原语的文档中阅读更多信息:https ://docs.featuretools.com/automated_feature_engineering/primitives.html

于 2018-09-01T21:56:02.843 回答