Optuna 允许用户使用suggest_
API 搜索参数空间。这很容易也很聪明。
但是,有一些参数我想保持不变。例如,使用 Scikit-Learn 的 DBSCAN 实现:
使用suggest_
API 搜索:
eps
min_samples
保持固定:
metric
n_jobs
有人可能会建议我在我的目标函数中进行硬编码metric
,如下所示:n_jobs
def objective(trial: Trial) -> float:
eps: float = trial.suggest_float(
name='eps',
low=self.eps_minimum,
high=self.eps_maximum,
)
min_samples: int = trial.suggest_int(
name='min_samples',
low=self.min_samples_minimum,
high=self.min_samples_maximum,
)
clustering = DBSCAN(
eps=eps,
min_samples=min_samples,
metric='cosine', # <-- hard-coded
n_jobs=16, # <-- hard-coded
).fit_predict(X=vectors) # <-- `vectors` is in scope but not shown
return adjusted_rand_score(
labels_true=labels, # <-- `labels` is in scope but not shown
labels_pred=clustering,
)
但是,我也希望以后能够从 Optuna 研究中检索这些参数的参数。据我所知,Optuna 不提供任何study.fixed_params
属性。
为了公开这些固定参数,我求助于通过suggest_
API 设置它们,并且只为搜索空间提供一种可能的选项。我发现trial.params['custom'] = 42
目标函数中的设置不起作用。
我对这个解决方案不满意,因为study.best_params
' 输出暗示(至少对我而言)这些值是在超参数搜索期间找到的。
问题:这是实现我的目标的规范方式吗?还是有其他方法?
我是 Optuna 的新手,所以请原谅我的天真。
from typing import *
from os import sched_getaffinity
from dataclasses import dataclass
from sklearn.cluster import DBSCAN
from sklearn.metrics import adjusted_rand_score
import optuna
from optuna.study.study import Study
from optuna.trial._trial import Trial
@dataclass
class Objective:
eps_minimum: float
eps_maximum: float
min_samples_minimum: int
min_samples_maximum: int
dbscan_metric: Sequence[str, ...] = ('cosine',)
n_jobs: int = len(sched_getaffinity(1)) - 1 or 1
def __call__(self, trial: Trial) -> float:
eps: float = trial.suggest_float(
name='eps',
low=self.eps_minimum,
high=self.eps_maximum,
)
min_samples: int = trial.suggest_int(
name='min_samples',
low=self.min_samples_minimum,
high=self.min_samples_maximum,
)
dbscan_metric: str = trial.suggest_categorical(
name='metric',
choices=self.dbscan_metric,
)
n_jobs: int = trial.suggest_int(
name='n_jobs',
low=self.n_jobs,
high=self.n_jobs,
)
clustering = DBSCAN(
eps=eps,
min_samples=min_samples,
metric=dbscan_metric,
n_jobs=n_jobs,
).fit_predict(X=vectors)
return adjusted_rand_score(
labels_true=labels,
labels_pred=clustering,
)
objective_dbscan: ObjectiveDBSCAN = ObjectiveDBSCAN(...)
study = optuna.create_study(direction='maximize')
study.optimize(objective_dbscan, n_trials=...)
...
>>> study.best_params
# this returns a dictionary with eps, min_samples, metric, and n_jobs
Optuna 版本:2.9.1