python - 如何嵌套 Sklearn 的多个回归函数？

Question

我正在尝试单独实现一个嵌套回归模型，我将其作为 TPOT 的输出。TPOT 的输出为：

RandomForestRegressor(XGBRegressor(XGBRegressor(**args1), **args2), **args3)

我的代码到目前为止：

from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor

xgb1 = XGBRegressor(**args1)
xgb2 = XGBRegressor(**args2)
rf = RandomForestRegressor(**args3)

我不确定如何按照 TPOT 的回答顺序正确组合它们。

score 0 · Accepted Answer

TPOT 分类器和回归器提供了一个 scikit-learn 管道对象，该对象已经为您完成了这项工作。

如果您同时查看TPOT APITPOTClassifier并TPOTRegressor公开一个属性，该属性fitted_pipeline_将包含 TPOT 可以找到的最佳 scikit-learn 管道。scikit-learn 管道的示例：

PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
    XGBRegressor(learning_rate=0.1, max_depth=4, min_child_weight=14, n_estimators=100, n_jobs=1, objective="reg:squarederror", subsample=1.0, verbosity=0)

您可以将其转储以供以后加载，因此您不必重新训练您的模型，或者您可以简单地使用 TPOT 分类器和回归器内置函数导出最佳管道，将优化的管道导出为 Python 代码，这样您就可以重新-适合您的模型：

tpot.export('tpot_digits_pipeline.py')

如果由于某种原因您只在问题中发布了该输出，则可以像这样重新创建 scikit-learn 管道：

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline

tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
            train_test_split(features, tpot_data['target'], random_state=42)

exported_pipeline = make_pipeline(
  RandomForestRegressor(XGBRegressor(XGBRegressor(<replace with actual arg list>), <replace with actual arg list>), <replace with actual arg list>)
)

exported_pipeline.fit(training_features, training_target)

python - 如何嵌套 Sklearn 的多个回归函数？

1 回答 1

Related

Reference