jupyter-notebook - 在调整 CatBoost 超参数时遇到问题

Question

我正在做 Kaggle 的 Bulldozer-blue-book 项目。我目前正在使用 CatBoost 来查看是否可以改进我的模型。我这样实例化 CatBoost：

cat_regressor = CatBoostRegressor()

cat_regressor.fit(Xtrain[:100000], ytrain[:100000])

然后我尝试使用 RandomizedSearchCV 调整超参数：

%%time

    from sklearn.model_selection import RandomizedSearchCV
    cat_grid = {
        'iterations': np.arange(10, 1000, 10),
        'depth': np.arange(2, 16, 2),
        'learning_rate': [0.01, 0.05, 0.1]
    }
    
    cat_model_rs = RandomizedSearchCV(estimator=cat_regressor,
                                     param_distributions=cat_grid,
                                     n_iter=250,
                                     cv=5,
                                     verbose=True)
    
    cat_model_rs.fit(Xtrain[:100000], ytrain[:100000])

现在，到目前为止，计算机需要很长时间才能将这些参数拟合到搜索中（比我调整 RandomForestRegressor 时要长得多）。昨天我在使用 GPU 时遇到了“内核停止”（不记得 Jupyter 是如何呈现错误的）。今天我正在实现CPU。搜索仍在全力进行，此时感觉模型陷入了无限循环，我只是在等待内核停止。我也尝试过使用 Google Colab，但是用于查找超参数的单元格也按时用完了。我在这里不知所措。

我是使用 CatBoost 的新手，有谁知道我是否错过了一个参数，或者 RandomizedSearchCV 是否没有完全为 Catboost 实现？

score 0 · Accepted Answer

找出为什么这不起作用。显然iterations不能取高于 500 的值，所以设置它解决了我的问题。

jupyter-notebook - 在调整 CatBoost 超参数时遇到问题

1 回答 1

Related

Reference