TPOT
声明Average CV score on the training set was: -128.90187963562252
(neg_MAE)的导出管道。然而,用相同的训练集重新拟合管道会产生更小的 MAE,大约为 (35)。此外,预测看不见的测试集将产生一个大约 (140) 的 MAE,这与导出管道的说明一致。
我有点困惑,想知道如何在训练集上重现错误分数。
管道似乎过拟合了??
cv = RepeatedKFold(n_splits=4, n_repeats=1, random_state=1)
model = TPOTRegressor(generations=10, population_size=25, offspring_size=None, mutation_rate=0.9,
crossover_rate=0.1, scoring='neg_mean_absolute_error', cv=cv,
subsample=0.75,n_jobs=-1, max_time_mins=None,
max_eval_time_mins=5,random_state=42,config_dict=None, template=None,
warm_start=False, memory=None,
use_dask=False,periodic_checkpoint_folder=None, early_stop=3, verbosity=2,
disable_update_check=False, log_file=None)
model.fit(train_df[x], train_df[y])
# The Exported model
# Average CV score on the training set was: -128.90187963562252
exported_pipeline = make_pipeline(StackingEstimator(estimator=LassoLarsCV(normalize=True)),
StackingEstimator(estimator=ExtraTreesRegressor(bootstrap=True,
max_features=0.4, min_samples_leaf=1,
min_samples_spli`enter code here`t=7, n_estimators=100)),
PolynomialFeatures(degree=2, include_bias=False,
interaction_only=False),
ExtraTreesRegressor(bootstrap=True,
max_features=0.15000000000000002, min_samples_leaf=9,
min_samples_split=7,n_estimators=100))
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 42)
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
提前致谢