我的环境是:
Operating system version.... Windows-10-10.0.17134-SP0
Python version is........... 3.6.5
pandas version is........... 0.23.0
numpy version is............ 1.14.3
Featuretools................ 0.3.0
我的熊猫数据框看起来像:
df
index BoxRatio Thrust Velocity OnBalRun vwapGain
0 1 0.324000 0.615000 1.525000 3.618000 0.416000
1 2 0.938249 0.366377 2.402230 6.393223 2.667106
2 3 0.317000 -0.281000 0.979000 1.489000 0.506000
3 4 0.289000 -0.433000 0.796000 2.081000 0.536000
4 5 1.551115 -0.103734 0.731682 1.752156 0.667016
我尝试了以下方法:
es = ft.EntitySet('Pattern')
es.entity_from_dataframe(dataframe=df,
entity_id='my_id',
index='index')
def log10(column):
return np.log10(column)
Log10 = make_trans_primitive(function=log10,
input_types=[Numeric],
return_type=Numeric)
from featuretools.primitives import (Count, Sum, Mean, Median, Std, Min, Max, Multiply)
feature_matrix, feature_names = ft.dfs(entityset=es,
target_entity='my_id',
trans_primitives=[Log10])
print('feature_names:\n')
for item in feature_names:
print(' ' + item)
这给出了以下内容:
feature_names:
<Feature: + BoxRatio>
<Feature: + Thrust>
<Feature: + Velocity>
<Feature: + OnBalRun>
<Feature: + vwapGain>
<Feature: + LOG10(BoxRatio)>
<Feature: + LOG10(Thrust)>
<Feature: + LOG10(Velocity)>
<Feature: + LOG10(OnBalRun)>
<Feature: + LOG10(vwapGain)>
到目前为止一切顺利......现在如果我添加“Min”原语,我得到:
Traceback (most recent call last):
File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 112, in <module>
Main()
File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 95, in Main
trans_primitives=[Log10, Min])
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\dfs.py", line 184, in dfs
features = dfs_object.build_features(verbose=verbose)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 218, in build_features
all_features, max_depth=self.max_depth)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 365, in _run_dfs
all_features, entity, max_depth=max_depth)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 514, in _build_transform_features
new_f = trans_prim(*matching_input)
TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'
我希望看到每个列特征的最小值(就像 Log10 原语一样)。当然我可以定义自己的 Min 原语,但我希望有一个简单的解决方案。
查尔斯