我正在使用光线调整运行超参数优化,并且遇到 MemoryError 崩溃。
从概念上讲,代码如下: 训练函数:
def training_func(config, checkpoint_dir=None):
args = config["args"]
train_args = config["train_args"]
python_class = config["class"]
model = python_class(**args)
if checkpoint_dir:
model.load(checkpoint_dir)
else:
scores = model.traintest(**train_args)
with tune.checkpoint_dir(step=0) as checkpoint_dir:
model.save(checkpoint_dir)
# save some data
if "some_key" in training_args:
with open(join(checkpoint_dir, "data.pkl"), "wb") as f:
pickle.dump(train_args["some_key"], f)
del model
gc.collect()
tune.report(**scores)
调调:
# [.....]
ray.shutdown()
ray.init()
config = {
"python_class": python_class,
"train_args": some_dictionary,
"args": some_other_dictionary}
analysis = tune.run(
training_func, progress_reporter=tune.CLIReporter(),
metric="macro_avg_f1-score",
keep_checkpoints_num=1,
num_samples=1,
local_dir="/path/to/tune/results/",
config=config)
运行它,内存使用量持续增长,尽管在目标函数训练后释放了实例化模型。调优迭代完成后如何丢弃资源?如果这不可能,我如何确保只有一个调优实验同时运行(以便我可以手动从内存中提取不需要的变量)?
SO中的这些问题是相似的,我已经采纳了他们的一些建议(例如明确调用GC),但问题没有解决。