python - 如何知道 pytorch 中使用了多少 GPU？

Question

我用来启动培训的 bash 文件如下所示：

CUDA_VISIBLE_DEVICES=3,4 python -m torch.distributed.launch \
--nproc_per_node=2  train.py \
--batch_size 6 \
--other_args

我发现每个 GPU 中张量的批量大小实际上是batch_size / num_of_gpu= 6/2= 3。

当我初始化我的网络时，我需要知道每个 GPU 的批量大小。（Ps. 在这个阶段，我不能input_tensor.shape用来获取批量维度的大小，因为 jet 中没有数据。）

不知何故，我找不到 pytorch 将参数存储在哪里--nproc_per_node。那么我怎么能知道使用了多少 GPU，而无需手动传递呢--other_args？

score 1 · Accepted Answer

1

我认为您正在寻找torch.distributed.get_world_size()- 这将告诉您创建了多少进程。

于 2021-08-16T13:14:58.567 回答

1 回答 1