(编辑以提供和解释一个最小的可重现示例)
当向后挂钩与 pytorch xla 一起使用时,我看到以下错误。
- 将 pytorch-xla 替换为普通 pytorch(又名 pytorch cuda)时,不会出现该错误。
- 当复制渐变的行在后向钩子中被注释掉时,pytorch-xla 看不到错误。
Traceback (most recent call last):
File "test1.py", line 30, in <module>
l.backward()
File "/anaconda3/envs/torch-xla-nightly/lib/python3.6/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/anaconda3/envs/torch-xla-nightly/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Error while lowering: f32[1,2,16,16]{3,2,1,0} aten::constant_pad_nd, pad=[0, -1, 0, -1, 0, 0, 0, 0], value=0
XLA builder error: Invalid argument: The element types of the operands to Pad do not match.:
Python Frames:
创建此错误的最少代码:
import torch
import torch_xla
import torch_xla.core.xla_model as xm
def loss(output, target):
l = torch.sum(output - target)
return l
model = torch.nn.Sequential(
# minimal model to reproduce the error
torch.nn.ConstantPad2d((0, 1, 0, 1), 0),
torch.nn.Conv2d(1, 2, kernel_size=(3, 3), stride=(2, 2)),
torch.nn.ConstantPad2d((0, 1, 0, 1), 0),
torch.nn.Conv2d(2, 2, kernel_size=(3, 3), stride=(2, 2))
)
model = model.to(xm.xla_device())
def dummyHook(module, gradIn, gradOut):
# error is not seen if i comment out the below line
g = gradOut[0].cpu()
print(str(module))
x = torch.randn((1, 1, 32, 32), device=xm.xla_device(), dtype=torch.float)
target = torch.ones((1, 2, 8, 8), device=xm.xla_device(), dtype=torch.float)
for module in model.modules():
module.register_backward_hook(dummyHook)
y = model(x)
l = loss(y, target)
l.backward()
可能出了什么问题?