我在 pytorch 中实现了以下雅可比函数。除非我犯了一个错误,否则它会计算任何张量 wrt 任何维度输入的雅可比行列式:
import torch
import torch.autograd as ag
def nd_range(stop, dims = None):
if dims == None:
dims = len(stop)
if not dims:
yield ()
return
for outer in nd_range(stop, dims - 1):
for inner in range(stop[dims - 1]):
yield outer + (inner,)
def full_jacobian(f, wrt):
f_shape = list(f.size())
wrt_shape = list(wrt.size())
fs = []
f_range = nd_range(f_shape)
wrt_range = nd_range(wrt_shape)
for f_ind in f_range:
grad = ag.grad(f[tuple(f_ind)], wrt, retain_graph=True, create_graph=True)[0]
for i in range(len(f_shape)):
grad = grad.unsqueeze(0)
fs.append(grad)
fj = torch.cat(fs, dim=0)
fj = fj.view(f_shape + wrt_shape)
return fj
最重要的是,我尝试实现一个递归函数来计算 n 阶导数:
def nth_derivative(f, wrt, n):
if n == 1:
return full_jacobian(f, wrt)
else:
deriv = nth_derivative(f, wrt, n-1)
return full_jacobian(deriv, wrt)
我进行了一个简单的测试:
op = torch.ger(s, s)
deep_deriv = nth_derivative(op, s, 5)
不幸的是,这成功地让我得到了 Hessian ......但没有高阶导数。我知道许多高阶导数应该为 0,但如果 pytorch 可以分析计算它,我更愿意。
一种解决方法是将梯度计算更改为:
try:
grad = ag.grad(f[tuple(f_ind)], wrt, retain_graph=True, create_graph=True)[0]
except:
grad = torch.zeros_like(wrt)
这是处理此问题的公认正确方法吗?还是有更好的选择?还是我的问题一开始就完全错误的原因?