python - Pytorch：从矩阵元素的总和反向传播到叶变量

Question

我试图更好地理解 pytorch 中的反向传播。我有一个代码片段，它成功地从输出 d 反向传播到叶变量 a，但是如果我添加一个重塑步骤，反向传播不再给输入一个梯度。

我知道 reshape 不合适，但我仍然不确定如何将其置于上下文中。

有什么想法吗？

谢谢。

#Works
a = torch.tensor([1.])
a.requires_grad = True
b = torch.tensor([1.])
c = torch.cat([a,b])
d = torch.sum(c)
d.backward()

print('a gradient is')
print(a.grad) #=> Tensor([1.])

#Doesn't work
a = torch.tensor([1.])
a.requires_grad = True
a = a.reshape(a.shape)
b = torch.tensor([1.])
c = torch.cat([a,b])
d = torch.sum(c)
d.backward()

print('a gradient is')
print(a.grad) #=> None

score 2 · Accepted Answer

编辑：

这是对正在发生的事情的详细解释（“这本身不是错误，但绝对是混乱的根源”）：https ://github.com/pytorch/pytorch/issues/19778

因此，一种解决方案是专门要求保留 grad 现在 non-leaf a：

a = torch.tensor([1.])
a.requires_grad = True
a = a.reshape(a.shape)
a.retain_grad()
b = torch.tensor([1.])
c = torch.cat([a,b])
d = torch.sum(c)
d.backward()

老答案：

如果你a.requires_grad = True在重塑后移动，它会起作用：

a = torch.tensor([1.])
a = a.reshape(a.shape)
a.requires_grad = True
b = torch.tensor([1.])
c = torch.cat([a,b])
d = torch.sum(c)
d.backward()

似乎是 PyTorch 中的一个错误，因为在这之后a.requires_grad仍然是正确的。

a = torch.tensor([1.])
a.requires_grad = True
a = a.reshape(a.shape)

这似乎与a您的“不起作用”示例中不再是叶子的事实有关，但在其他情况下仍然是叶子（打印a.is_leaf以检查）。

python - Pytorch：从矩阵元素的总和反向传播到叶变量

1 回答 1

Related

Reference