2

我正在使用 Azure Databricks + Hyperopt + MLflow 对小型数据集进行一些超参数调整。好像作业正在运行,我在 MLflow 中得到输出,但作业以以下错误消息结束:

Hyperopt failed to execute mlflow.end_run() with tracking URI: databricks

这是我的代码,其中包含一些信息:

from pyspark.sql import SparkSession

# spark session initialization
spark = (SparkSession.builder.getOrCreate())
sc = spark.sparkContext

# Data Processing
import pandas as pd
import numpy as np
# Hyperparameter Tuning
from hyperopt import fmin, tpe, hp, anneal, Trials, space_eval, SparkTrials, STATUS_OK
from sklearn.model_selection import RepeatedStratifiedKFold, cross_val_score
# Modeling
from sklearn.ensemble import RandomForestClassifier
# cleaning
import gc
# tracking
import mlflow
# track runtime
from datetime import date, datetime

mlflow.set_experiment('/user/myname/myexp')
# notebook settings \ variable settings
n_splits = #
n_repeats = #
max_evals = #

dfL = pd.read_csv("/my/data/loc/mydata.csv")

x_train = dfL[['f1','f2','f3']]
y_train = dfL['target']

def define_model(params):
    model = RandomForestClassifier(n_estimators=int(params['n_estimators']),
                                   criterion=params['criterion'], 
                                   max_depth=int(params['max_depth']), 
                                   min_samples_split=params['min_samples_split'], 
                                   min_samples_leaf=params['min_samples_leaf'], 
                                   min_weight_fraction_leaf=params['min_weight_fraction_leaf'], 
                                   max_features=params['max_features'], 
                                   max_leaf_nodes=None, 
                                   min_impurity_decrease=params['min_impurity_decrease'], 
                                   min_impurity_split=None, 
                                   bootstrap=params['bootstrap'], 
                                   oob_score=False, 
                                   n_jobs=-1, 
                                   random_state=int(params['random_state']), 
                                   verbose=0, 
                                   warm_start=False, 
                                   class_weight={0:params['class_0_weight'], 1:params['class_1_weight']})
        return model


space = {'n_estimators': hp.quniform('n_estimators', #, #, #),
         'criterion': hp.choice('#', ['#','#']),
         'max_depth': hp.quniform('max_depth', #, #, #),
         'min_samples_split': hp.quniform('min_samples_split', #, #, #),
         'min_samples_leaf': hp.quniform('min_samples_leaf', #, #, #),
         'min_weight_fraction_leaf': hp.quniform('min_weight_fraction_leaf', #, #, #),
         'max_features': hp.quniform('max_features', #, #, #),
         'min_impurity_decrease': hp.quniform('min_impurity_decrease', #, #, #),
         'bootstrap': hp.choice('bootstrap', [#,#]),
         'random_state': hp.quniform('random_state', #, #, #),
         'class_0_weight': hp.choice('class_0_weight', [#,#,#]),
         'class_1_weight': hp.choice('class_1_weight', [#,#,#])}

# define hyperopt objective
def objective(params, n_splits=n_splits, n_repeats=n_repeats):

    # define model
    model = define_model(params)
    # get cv splits
    kfold = RepeatedStratifiedKFold(n_splits=n_splits, n_repeats=n_repeats, random_state=1331)
    # define and run sklearn cv scorer
    scores = cross_val_score(model, x_train, y_train, cv=kfold, scoring='roc_auc')
    score = scores.mean()

    return {'loss': score*(-1), 'status': STATUS_OK}

spark_trials = SparkTrials(parallelism=36, spark_session=spark)
with mlflow.start_run():
  best = fmin(objective, space, algo=tpe.suggest, trials=spark_trials, max_evals=max_evals)

然后最后我得到..

100%|██████████| 200/200 [1:35:28<00:00, 100.49s/trial, best loss: -0.9584565527065526]

Hyperopt failed to execute mlflow.end_run() with tracking URI: databricks

Exception: 'MLFLOW_RUN_ID'

Total Trials: 200: 200 succeeded, 0 failed, 0 cancelled.

我的 Azure Databricks 群集是..

6.6 ML (includes Apache Spark 2.4.5, Scala 2.11)
Standard_DS3_v2
min 9 max 18 nodes

我做错了什么还是这是一个错误?

4

1 回答 1

2

此消息是一个已知(但无害的)问题,已针对 MLR 7.0 进行了修复。我已经尝试在 DBR 7.0 ML 集群上执行它并且它正在工作。

你不需要start_run();SparkTrials 会自动为您启动一次运行。错误只是因为这个。

因此,使用 SparkTrials,它仍然可以在没有start_run();的情况下使用。SparkTrials 应该会自动为您运行和记录。

于 2020-06-11T21:22:46.427 回答