pytorch - 如何在 Pytorch 中将 GPU 与 Ray 结合使用？我应该为远程类指定 num_gpus 吗？

Question

当我将 Ray 与 pytorch 一起使用时，我没有为远程类设置任何 num_gpus 标志。

我收到以下错误：

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False.

主要过程是：我创建一个远程类并将一个pytorch模型传输state_dict()(created in main function)给它。在 main 函数中，torch.cuda.is_available()is True，但在 remote 函数中，torch.cuda.is_available()is False。谢谢

我尝试设置 num_gpus=1 并遇到一个新问题：程序卡住了。以下是重现此问题的最小示例代码。谢谢。

import ray


@ray.remote(num_gpus=1)
class Worker(object):
    def __init__(self, args):
        self.args = args
        self.gen_frames = 0

    def set_gen_frames(self, value):
        self.gen_frames = value
        return self.gen_frames

    def get_gen_num(self):
        return self.gen_frames


class Parameters:
    def __init__(self):
        self.is_cuda = False;
        self.is_memory_cuda = True
        self.pop_size = 10


if __name__ == "__main__":
    ray.init()
    args = Parameters()
    workers = [Worker.remote(args) for _ in range(args.pop_size)]
    get_num_ids = [worker.get_gen_num.remote() for worker in workers]
    gen_nums = ray.get(get_num_ids)
    print(gen_nums)

score 5 · Accepted Answer

如果您还想在 gpu 上部署模型，则需要确保您的 actor 或任务确实可以访问 gpu（使用 @ray.remote(num_gpus=1)，这将确保 torch.cuda.is_available () 将在该远程函数中为真）。如果要在 CPU 上部署模型，则需要在加载模型时指定，例如参见https://github.com/pytorch/pytorch/issues/9139。

pytorch - 如何在 Pytorch 中将 GPU 与 Ray 结合使用？我应该为远程类指定 num_gpus 吗？

1 回答 1

Related

Reference