0

我在 Windows 10 我该如何解决:

RuntimeError:没有针对 env://问题的集合处理程序?

Traceback (most recent call last):

  File "main.py", line 312, in <module>
    torch.distributed.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank)

  File "F:\Anaconda3\envs\swin\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group
    init_method, rank, world_size, timeout=timeout

  File "F:\Anaconda3\envs\swin\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous
    raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for env://

Traceback (most recent call last):

  File "F:\Anaconda3\envs\swin\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)

  File "F:\Anaconda3\envs\swin\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)

  File "F:\Anaconda3\envs\swin\lib\site-packages\torch\distributed\launch.py", line 260, in <module>
    main()

  File "F:\Anaconda3\envs\swin\lib\site-packages\torch\distributed\launch.py", line 256, in main
    cmd=cmd)

subprocess.CalledProcessError: Command '['F:\\Anaconda3\\envs\\swin\\python.exe', '-u', 'main.py', '--local_rank=0', '--cfg=configs/sw
in_tiny_patch4_window7_224.yaml', '--data-path=imagenet', '--batch-size=64']' returned non-zero exit status 1.
4

1 回答 1

0

在rendezvous.py 中,有:

if sys.platform != 'win32':
    register_rendezvous_handler("tcp", _tcp_rendezvous_handler)
    register_rendezvous_handler("env", _env_rendezvous_handler)

因此,在您的情况下,“env”处理程序没有被注册。这会导致您得到的错误,即RuntimeError: No rendezvous handler for env://

https://github.com/PyTorchLightning/pytorch-lightning/issues/5358?force_isolation=true似乎有解决方案

它建议将加速器更改为 dp b/c ddp 在 Windows 平台上尚不支持。

于 2021-11-05T23:55:39.303 回答