2

我对 PyTorch 比较陌生,我正在尝试从使用 Hessian 矩阵近似一个术语的学术论文中重现一种算法。我已经设置了一个玩具问题,以便我可以将完整 Hessian 的结果与近似值进行比较。我找到了这个要点并一直在使用它来计算算法的完整 Hessian 部分。

我收到错误消息:“RuntimeError:梯度计算所需的变量之一已被就地操作修改。”

我浏览了简单的示例代码、文档和许多关于此问题的论坛帖子,但找不到任何就地操作。任何帮助将不胜感激!

这是我的代码:

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

torch.set_printoptions(precision=20, linewidth=180)

def jacobian(y, x, create_graph=False):
    jac = []
    flat_y = y.reshape(-1)     
    grad_y = torch.zeros_like(flat_y)     

    for i in range(len(flat_y)):         
        grad_y[i] = 1.
        grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
        jac.append(grad_x.reshape(x.shape))
        grad_y[i] = 0.
    return torch.stack(jac).reshape(y.shape + x.shape)           

def hessian(y, x):
    return jacobian(jacobian(y, x, create_graph=True), x)                                             

def f(x):                                                                                             
    return x * x

np.random.seed(435537698)

num_dims = 2
num_samples = 3

X = [np.random.uniform(size=num_dims) for i in range(num_samples)]
print('X: \n{}\n\n'.format(X))

mean = torch.Tensor(np.mean(X, axis=0))
mean.requires_grad = True
print('mean: \n{}\n\n'.format(mean))

cov = torch.Tensor(np.cov(X, rowvar=False))
print('cov: \n{}\n\n'.format(cov))

with autograd.detect_anomaly():
    hessian_matrices = hessian(f(mean), mean)
    print('hessian: \n{}\n\n'.format(hessian_matrices))

这是堆栈跟踪的输出:

X: 
[array([0.81700949, 0.17141617]), array([0.53579366, 0.31141496]), array([0.49756485, 0.97495776])]


mean: 
tensor([0.61678934097290039062, 0.48592963814735412598], requires_grad=True)


cov: 
tensor([[ 0.03043144382536411285, -0.05357056483626365662],
        [-0.05357056483626365662,  0.18426130712032318115]])


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-5a1c492d2873> in <module>()
     42 
     43 with autograd.detect_anomaly():
---> 44     hessian_matrices = hessian(f(mean), mean)
     45     print('hessian: \n{}\n\n'.format(hessian_matrices))

2 frames
<ipython-input-3-5a1c492d2873> in hessian(y, x)
     21 
     22 def hessian(y, x):
---> 23     return jacobian(jacobian(y, x, create_graph=True), x)
     24 
     25 def f(x):

<ipython-input-3-5a1c492d2873> in jacobian(y, x, create_graph)
     15     for i in range(len(flat_y)):
     16         grad_y[i] = 1.
---> 17         grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
     18         jac.append(grad_x.reshape(x.shape))
     19         grad_y[i] = 0.

/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in grad(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)
    155     return Variable._execution_engine.run_backward(
    156         outputs, grad_outputs, retain_graph, create_graph,
--> 157         inputs, allow_unused)
    158 
    159 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
4

2 回答 2

1

我真诚地认为这是 PyTorch 中的一个错误,但是在发布了一个错误之后,我得到了 albanD 的一个很好的回答。https://github.com/pytorch/pytorch/issues/36903#issuecomment-616671247他还指出https://discuss.pytorch.org/可用于提问。

问题出现是因为我们一次又一次地遍历计算图。虽然这里发生的事情超出了我的范围......

您的错误消息所指的就地编辑是显而易见的:grad_y[i] = 1.grad_y[i] = 0.. 在计算中一遍又一grad_y遍地继续是导致麻烦的原因。重新定义jacobian(...)如下对我有用。

def jacobian(y, x, create_graph=False):
    jac = []
    flat_y = y.reshape(-1)
    for i in range(len(flat_y)):
        grad_y = torch.zeros_like(flat_y)
        grad_y[i] = 1.
        grad_x, = torch.autograd.grad(flat_y, x, grad_y, retain_graph=True, create_graph=create_graph)
        jac.append(grad_x.reshape(x.shape))
    return torch.stack(jac).reshape(y.shape + x.shape)

另一种可行的方法,但对我来说更像是黑魔法,而是jacobian(...)原样离开,而是重新定义f(x)

def f(x):
    return x * x * 1

这也有效。

于 2020-04-20T20:33:22.533 回答
0

对于未来的读者来说,标题中提到的 RuntimeError 可能出现在比原作者更普遍的环境中,例如在移动张量切片和/或从列表推导中操作张量时,因为这是导致我在这里(我的搜索引擎为 RuntimeError 返回的第一个链接)。

为了防止出现此 RuntimeError 并确保渐变可以流畅地流动,对我来说最有用的理由在上面的链接中提到(但在解决方案消息中缺少),它包括在移动它们时使用 s的.clone()方法torch.Tensor(或其中的一些切片)。

例如:

some_container[slice_indices] = original_tensor[slice_indices].clone()

其中只有original_tensorhaverequires_grad=True和后续(可能是批处理)操作将在 tensor 上执行some_container

或者:

some_container = [
    tensor.clone() 
    for tensor in some_tensor_list if some_condition_fn(tensor)
]
new_composed_tensor = torch.cat(some_container, dim=0)
于 2021-01-08T21:58:53.297 回答