2

我正在尝试使用 WandB 梯度可视化来调试我在 Google Colab 上的神经网络中的梯度流。没有 WandB 日志记录,训练运行没有错误,在 p100 gpu 上占用 11Gb/16GB。但是,添加此行wandb.watch(model, log='all', log_freq=3)会导致 cuda 内存不足错误。

WandB 日志记录如何产生额外的 GPU 内存开销?

有什么方法可以减少开销吗?

--添加训练循环代码--

learning_rate = 0.001
num_epochs = 50

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = MyModel()

model = model.to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
wandb.watch(model, log='all', log_freq=3)

for epoch in range(num_epochs):
    train_running_loss = 0.0
    train_accuracy = 0.0

    model = model.train()

    ## training step
    for i, (name, output_array, input) in enumerate(trainloader):
        
        output_array = output_array.to(device)
        input = input.to(device)
        comb = torch.zeros(1,1,100,1632).to(device)

        ## forward + backprop + loss
        output = model(input, comb)

        loss = my_loss(output, output_array)

        optimizer.zero_grad()

        loss.backward()

        ## update model params
        optimizer.step()

        train_running_loss += loss.detach().item()

        temp = get_accuracy(output, output_array)

        print('check 13')
        !nvidia-smi | grep MiB | awk '{print $9 $10 $11}'

        train_accuracy += temp     

- - -编辑 - - -

我认为 WandB 在日志预处理期间创建了一个额外的梯度副本。这是回溯:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-13de83557b55> in <module>()
     60         get_ipython().system("nvidia-smi | grep MiB | awk '{print $9 $10 $11}'")
     61 
---> 62         loss.backward()
     63 
     64         print('check 10')

4 frames
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253                 create_graph=create_graph,
    254                 inputs=inputs)
--> 255         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256 
    257     def register_hook(self, hook):

/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    147     Variable._execution_engine.run_backward(
    148         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 149         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    150 
    151 

/usr/local/lib/python3.7/dist-packages/wandb/wandb_torch.py in <lambda>(grad)
    283             self.log_tensor_stats(grad.data, name)
    284 
--> 285         handle = var.register_hook(lambda grad: _callback(grad, log_track))
    286         self._hook_handles[name] = handle
    287         return handle

/usr/local/lib/python3.7/dist-packages/wandb/wandb_torch.py in _callback(grad, log_track)
    281             if not log_track_update(log_track):
    282                 return
--> 283             self.log_tensor_stats(grad.data, name)
    284 
    285         handle = var.register_hook(lambda grad: _callback(grad, log_track))

/usr/local/lib/python3.7/dist-packages/wandb/wandb_torch.py in log_tensor_stats(self, tensor, name)
    219         # Remove nans from tensor. There's no good way to represent that in histograms.
    220         flat = flat[~torch.isnan(flat)]
--> 221         flat = flat[~torch.isinf(flat)]
    222         if flat.shape == torch.Size([0]):
    223             # Often the whole tensor is nan or inf. Just don't log it in that case.

RuntimeError: CUDA out of memory. Tried to allocate 4.65 GiB (GPU 0; 15.90 GiB total capacity; 10.10 GiB already allocated; 717.75 MiB free; 14.27 GiB reserved in total by PyTorch)

- -更新 - -

确实,注释掉有问题的行flat = flat[~torch.isinf(flat)]

使日志记录步骤几乎不适合 GPU 内存。

4

0 回答 0