optuna - 设置固定参数并在研究完成后检索它们的规范方法是什么？

Question

Optuna 允许用户使用suggest_API 搜索参数空间。这很容易也很聪明。

但是，有一些参数我想保持不变。例如，使用 Scikit-Learn 的 DBSCAN 实现：

使用suggest_API 搜索：

eps
min_samples

保持固定：

metric
n_jobs

有人可能会建议我在我的目标函数中进行硬编码metric，如下所示：n_jobs

def objective(trial: Trial) -> float:
    eps: float = trial.suggest_float(
        name='eps',
        low=self.eps_minimum,
        high=self.eps_maximum,
    )
    min_samples: int = trial.suggest_int(
        name='min_samples',
        low=self.min_samples_minimum,
        high=self.min_samples_maximum,
    )
    clustering = DBSCAN(
        eps=eps,
        min_samples=min_samples,
        metric='cosine',         # <-- hard-coded
        n_jobs=16,               # <-- hard-coded
    ).fit_predict(X=vectors)     # <-- `vectors` is in scope but not shown
    return adjusted_rand_score(
        labels_true=labels,      # <-- `labels` is in scope but not shown
        labels_pred=clustering,
    )

但是，我也希望以后能够从 Optuna 研究中检索这些参数的参数。据我所知，Optuna 不提供任何study.fixed_params属性。

为了公开这些固定参数，我求助于通过suggest_API 设置它们，并且只为搜索空间提供一种可能的选项。我发现trial.params['custom'] = 42目标函数中的设置不起作用。

我对这个解决方案不满意，因为study.best_params' 输出暗示（至少对我而言）这些值是在超参数搜索期间找到的。

问题：这是实现我的目标的规范方式吗？还是有其他方法？

我是 Optuna 的新手，所以请原谅我的天真。

from typing import *
from os import sched_getaffinity
from dataclasses import dataclass
from sklearn.cluster import DBSCAN
from sklearn.metrics import adjusted_rand_score

import optuna
from optuna.study.study import Study
from optuna.trial._trial import Trial


@dataclass
class Objective:
    eps_minimum: float
    eps_maximum: float
    min_samples_minimum: int
    min_samples_maximum: int
    dbscan_metric: Sequence[str, ...] = ('cosine',)
    n_jobs: int = len(sched_getaffinity(1)) - 1 or 1

    def __call__(self, trial: Trial) -> float:
        eps: float = trial.suggest_float(
            name='eps',
            low=self.eps_minimum,
            high=self.eps_maximum,
        )
        min_samples: int = trial.suggest_int(
            name='min_samples',
            low=self.min_samples_minimum,
            high=self.min_samples_maximum,
        )
        dbscan_metric: str = trial.suggest_categorical(
            name='metric',
            choices=self.dbscan_metric,
        )
        n_jobs: int = trial.suggest_int(
            name='n_jobs',
            low=self.n_jobs,
            high=self.n_jobs,
        )
        clustering = DBSCAN(
            eps=eps,
            min_samples=min_samples,
            metric=dbscan_metric,
            n_jobs=n_jobs,
        ).fit_predict(X=vectors)
        return adjusted_rand_score(
            labels_true=labels,
            labels_pred=clustering,
        )

objective_dbscan: ObjectiveDBSCAN = ObjectiveDBSCAN(...)
study = optuna.create_study(direction='maximize')
study.optimize(objective_dbscan, n_trials=...)

...

>>> study.best_params
# this returns a dictionary with eps, min_samples, metric, and n_jobs

Optuna 版本：2.9.1

score 0 · Accepted Answer

您可以使用用户属性。

在目标（）中。

njobs = 16
trial.set_user_attr("n_jobs", njobs)

然后用它来检索它。

njobs = study.best_trial.user_attrs['n_jobs']

score -1 · Accepted Answer

在这里可能有用的是Partial Fixed Sampler。

在他们给出的示例中，最初调整了一个参数，然后在经过一定次数的试验后固定。在您的情况下，您可以从一开始就修复多个参数。

请注意，这目前被标记为实验性功能。

MWE 可能看起来像这样：

import optuna


class Objective:
    fixed_params = {
        'y': -1,
    }

    def __call__(self, trial):
        x = trial.suggest_float("x", -1, 1)
        # This is overridden by the fixed parameter.
        y = trial.suggest_int("y", -1, 1)
        return x ** 2 + y


objective = Objective()
study = optuna.create_study()
partial_sampler = optuna.samplers.PartialFixedSampler(objective.fixed_params, study.sampler)
study.sampler = partial_sampler
study.optimize(objective, n_trials=10)

这里， y 是一个固定参数，但由于它仍然定义在中trial.suggest_int(...)，它仍然出现在最佳参数集中（即，参数列表将包含x和y）。如果您不希望它出现在最佳参数列表中，只需删除该行并将其替换为y = self.fixed_params['y']. 现在参数列表将只包含x.

optuna - 设置固定参数并在研究完成后检索它们的规范方法是什么？

2 回答 2

Related

Reference