python-3.x - 如何包装 PyTorch 函数并实现 autograd？

Question

我正在阅读有关定义新的 autograd 函数的 PyTorch 教程。我要实现的 autograd 函数是torch.nn.functional.max_pool1d. 这是我到目前为止所拥有的：

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.autograd as tag

class SquareAndMaxPool1d(tag.Function):

    @staticmethod
    def forward(ctx, input, kernel_size, stride=None, padding=0, dilation=1, \
                return_indices=False, ceil_mode=False):
        ctx.save_for_backward( input )

        inputC = input.clone() #copy input
        inputC *= inputC

        output = F.max_pool1d(inputC, kernel_size, stride=stride, \
                              padding=padding, dilation=dilation, \
                              return_indices=return_indices, \
                              ceil_mode=ceil_mode)

        return output

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = get_max_pool1d_grad_somehow(grad_output)
        return 2.0*input*grad_input

我的问题是：如何获得包装函数的渐变？我知道考虑到我提供的示例非常简单，可能还有其他方法可以做到这一点，但我想做的事情适合这个框架并且需要我实现一个autograd函数。

编辑：在检查了这篇博文后，我决定尝试以下方法backward：

def backward(ctx, grad_output):
    input, output = ctx.saved_tensors
    grad_input = output.backward(grad_output)
    return 2.0*input*grad_input

添加output到保存的变量中。然后我运行以下代码：

x = np.random.randn(1,1,5)
xT = torch.from_numpy(x)
xT.requires_grad=True
f = SquareAndMaxPool1d.apply
s = torch.sum(f(xT,2))
s.backward()

我明白了Bus error: 10。

说xTis tensor([[[ 1.69533562, -0.21779421, 2.28693953, -0.86688095, -1.01033497]]], dtype=torch.float64)，那么我希望在调用之后找到它（xT.grad即包含）。tensor([[[ 3.39067124, -0. , 9.14775812, -0. , -2.02066994]]], dtype=torch.float64)s.backward()2*x*grad_of_max_poolgrad_of_max_pooltensor([[[1., 0., 2., 0., 1.]]], dtype=torch.float64)

我已经弄清楚为什么我得到一个Bus error: 10. 上面的代码似乎导致了 my backwardat的递归调用grad_input = output.backward(grad_output)。所以我需要找到其他方法来获得max_pool1d. 我知道如何在纯 Python 中实现这一点，但结果会比我可以包装库代码要慢得多。

score 10 · Accepted Answer

你选择了一个相当不幸的例子。torch.nn.functional.max_pool1d不是的实例torch.autograd.Function，因为它是 PyTorch 内置的，在 C++ 代码中定义并具有自动生成的Python 绑定。我不确定是否可以backward通过其接口获取该属性。

首先，如果你没有注意到，你不需要为这个公式的反向传播编写任何自定义代码，因为幂运算和max_pool1d已经定义了它，所以它们的组成也被 autograd 覆盖。假设您的目标是一项练习，我建议您更多地手动完成（不要退回到backwardof max_pool1d）。下面是一个例子

import torch
import torch.nn.functional as F
import torch.autograd as tag

class SquareAndMaxPool1d(tag.Function):
    @staticmethod
    def forward(ctx, input, kernel_size, **kwargs):
        # we're gonna need indices for backward. Currently SquareAnd...
        # never actually returns indices, I left it out for simplicity
        kwargs['return_indices'] = True

        input_sqr = input ** 2
        output, indices = F.max_pool1d(input_sqr, kernel_size, **kwargs)
        ctx.save_for_backward(input, indices)

        return output

    @staticmethod
    def backward(ctx, grad_output):
        input, indices = ctx.saved_tensors

        # first we need to reconstruct the gradient of `max_pool1d`
        # by putting all the output gradient elements (corresponding to
        # input elements which made it through the max_pool1d) in their
        # respective places, the rest has gradient of 0. We do it by
        # scattering it against a tensor of 0s
        grad_output_unpooled = torch.zeros_like(input)
        grad_output_unpooled.scatter_(2, indices, grad_output)

        # then incorporate the gradient of the "square" part of your
        # operator
        grad_input = 2. * input * grad_output_unpooled

        # the docs for backward
        # https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function.backward
        # say that "it should return as many tensors, as there were inputs
        # to forward()". It fails to mention that if an argument was not a
        # tensor, it should return None (I remember reading this somewhere,
        # but can't find it anymore). Anyway, we need to
        # return a (grad_input, None) tuple to avoid a complaint that two
        # outputs were expected
        return grad_input, None

然后我们可以使用数值梯度检查器来验证操作是否按预期工作。

f = SquareAndMaxPool1d.apply
xT = torch.randn(1, 1, 6, requires_grad=True, dtype=torch.float64)
tag.gradcheck(lambda t: f(t, 2), xT)

如果这不能解决您关于如何获得backwardof的问题，我很抱歉max_pool1d，但希望您发现我的回答足够有用。

score 5 · Accepted Answer

您在递归调用中遇到的问题实际上来自于，output并且默认情况下，with no_grad这是在类声明中继承自torch.autograd.Function. 如果您签output.grad_fn入forward，它可能是None，并且在backward，它可能会链接到函数对象<SquareAndMaxPool1d...>，从而导致递归调用。如果您仍然对如何完全按照您的要求进行操作感兴趣，这里有一个示例F.linear：

import torch
import torch.nn.functional as F

class custom_Linear(nn.Linear):
    def forward(self, _input):
        return Custom_Linear_AGfn_getAround.apply(_input, self.weight, self.bias)

class Custom_Linear_AGfn_getAround(torch.autograd.Function):
    @staticmethod
    def forward(ctx, _input, _weight, _bias):
        print('Custom forward')
        with torch.enable_grad():
            detached_input = _input.detach()
            detached_input.requires_grad_(True)
            detached_weight = _weight.detach()
            detached_weight.requires_grad_(True)
            detached_bias = _bias.detach()
            detached_bias.requires_grad_(True)
            _tmp = F.linear(detached_input, detached_weight, detached_bias)
        ctx.saved_input = detached_input
        ctx.saved_param = detached_weight, detached_bias
        ctx.save_for_backward(_tmp)
        _output = _tmp.detach()
        return _output

    @staticmethod
    def backward(ctx, grad_out):
        print('Custom backward')
        _tmp, = ctx.saved_tensors
        _weight, _bias = ctx.saved_param
        detached_input = ctx.saved_input
        with torch.enable_grad():
            _tmp.backward(grad_out)
        return detached_input.grad, _weight.grad, _bias.grad

基本上，它只是为感兴趣的部分构建一个小的孤立图，而不会弄乱主图，并在查看要分离的内容和孤立图需要什么时使用grad_fn并requires_grad跟踪这些图。

关于棘手的部分：

分离权重和偏见：你可以不去，但你可以通过_weight并_bias通过save_for_backwardand 将有_weight.grad，_bias.grad就像None在里面backward但一旦在外面_weight.grad，_bias.grad将有它们的正确值，或者你通过一个属性，比如ctx.saved_param，在这种情况下，你会必须手动输入(return )None的最后两个返回值，否则当您在后面检查权重和偏差梯度时，您将获得两倍的正确值。backwarddetached_input.grad, None, None
正如开头所说，backward对于forward继承的类，默认情况下torch.autograd.Function似乎有一个with no_grad行为。因此，在上面的代码中删除将with torch.enable_grad():导致（无法理解为什么默认情况下必须和to in尽管在我遇到之前需要渐变：https: //github.com/pytorch/pytorch/issues/7698）_tmp.grad_fnNone_tmpgrad_fnNonerequires_gradFalseforwarddetached_input
我相信，但我没有检查grad_fn如果_output你不分离它with torch.enable_grad()，你可能会_tmp.grad_fn得到双倍在无限递归调用中）。<Custom_Linear_AGfn_getAround...> grad_fnbackward

python-3.x - 如何包装 PyTorch 函数并实现 autograd？

2 回答 2

Related

Reference