pytorch - pytorch xla：Pad的操作数的元素类型不匹配

Question

（编辑以提供和解释一个最小的可重现示例）

当向后挂钩与 pytorch xla 一起使用时，我看到以下错误。

将 pytorch-xla 替换为普通 pytorch（又名 pytorch cuda）时，不会出现该错误。
当复制渐变的行在后向钩子中被注释掉时，pytorch-xla 看不到错误。

Traceback (most recent call last):
  File "test1.py", line 30, in <module>
    l.backward()
  File "/anaconda3/envs/torch-xla-nightly/lib/python3.6/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/anaconda3/envs/torch-xla-nightly/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Error while lowering: f32[1,2,16,16]{3,2,1,0} aten::constant_pad_nd, pad=[0, -1, 0, -1, 0, 0, 0, 0], value=0
XLA builder error: Invalid argument: The element types of the operands to Pad do not match.:
Python Frames:

创建此错误的最少代码：

import torch
import torch_xla
import torch_xla.core.xla_model as xm

def loss(output, target):
    l = torch.sum(output - target)
    return l

model = torch.nn.Sequential(
          # minimal model to reproduce the error
          torch.nn.ConstantPad2d((0, 1, 0, 1), 0),
          torch.nn.Conv2d(1, 2, kernel_size=(3, 3), stride=(2, 2)),
          torch.nn.ConstantPad2d((0, 1, 0, 1), 0),
          torch.nn.Conv2d(2, 2, kernel_size=(3, 3), stride=(2, 2))
        )
model = model.to(xm.xla_device())

def dummyHook(module, gradIn, gradOut):
    # error is not seen if i comment out the below line
    g = gradOut[0].cpu()
    print(str(module))

x = torch.randn((1, 1, 32, 32), device=xm.xla_device(), dtype=torch.float)
target = torch.ones((1, 2, 8, 8), device=xm.xla_device(), dtype=torch.float)

for module in model.modules():
    module.register_backward_hook(dummyHook)

y = model(x)
l = loss(y, target)
l.backward()

可能出了什么问题？

pytorch - pytorch xla：Pad的操作数的元素类型不匹配

0 回答 0

Related

Reference