我已提交在远程计算上运行的 autoML(Standard_D12_v2 - 4 节点集群 28GB,每个 4 核)
我的输入文件大约是 350 MB。
状态为“准备中”超过 2 小时。然后它失败了。
User error: Run timed out. No model completed training in the specified time. Possible solutions:
1) Please check if there are enough compute resources to run the experiment.
2) Increase experiment timeout when creating a run.
3) Subsample your dataset to decrease featurization/training time.
下面是我的 python-Notebook 代码,请帮忙。
import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.core.compute import ComputeTarget
from azureml.train.automl import AutoMLConfig
ws = Workspace.from_config()
experiment=Experiment(ws, 'nyc-taxi')
cpu_cluster_name = "low-cluster"
compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
data = "https://betaml4543906917.blob.core.windows.net/betadata/2015_08.csv"
dataset = Dataset.Tabular.from_delimited_files(data)
training_data, validation_data = dataset.random_split(percentage=0.8, seed=223)
label_column_name = 'totalAmount'
automl_settings = {
"n_cross_validations": 3,
"primary_metric": 'normalized_root_mean_squared_error',
"enable_early_stopping": True,
"max_concurrent_iterations": 2, # This is a limit for testing purpose, please increase it as per cluster size
"experiment_timeout_hours": 2, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible
"verbosity": logging.INFO,
}
automl_config = AutoMLConfig(task = 'regression',
debug_log = 'automl_errors.log',
compute_target = compute_target,
training_data = training_data,
label_column_name = label_column_name,
**automl_settings
)
remote_run = experiment.submit(automl_config, show_output = False)