pytorch - 尽管模型和输入都在同一设备上，但 Torchscript 跟踪“必须在当前设备上”错误

Question

尽管我尽了最大努力，但我还是无法运行 torch.jit.trace，遇到RuntimeError: Input, output and indices must be on the current device

我有一个（相当复杂的）模型，我已经把它放在 GPU 上，连同一组输入，也在 GPU 上。我可以验证所有输入张量和模型参数和缓冲区都在同一设备上：

(Pdb) {p.device for p in self.parameters()}
{device(type='cuda', index=0)}
(Pdb) {p.device for p in self.buffers()}
{device(type='cuda', index=0)}
(Pdb) in_ = (<several tensors here>)
(Pdb) {p.device for p in in_}
{device(type='cuda', index=0)}
(Pdb) torch.cuda.current_device()
0

我可以证明模型运行并且输出在正确的设备上：

(Pdb) self(*in_).device
device(type='cuda', index=0)

尽管如此，追踪还是失败了：

(Pdb) generator_script = torch.jit.trace(self, example_inputs=in_)
*** RuntimeError: Input, output and indices must be on the current device

我了解输入和输出，但是必须在同一设备上的这些“索引”是什么？
我没有考虑到的其他哪些因素可能导致跟踪失败？

score 0 · Accepted Answer

在将 trace 命令硬编码到我的代码中之后，我能够获得更精确的堆栈跟踪，这让我可以看到这段代码，我简化了它以便于阅读：

B, L, C, H, W = inp_seq.shape
ref_seq = torch.repeat_interleave(
    ref_seq.squeeze(dim=1),
    repeats=L,
    dim=0,
)

在正常执行期间，L计算结果为 a python int，但使用 pdb 我能够确定 L 变为 a Tensor，这应该没问题，除了这个张量在 cpu 上，并导致错误。

强制转换L为 int 足以克服此错误：

B, L, C, H, W = inp_seq.shape
ref_seq = torch.repeat_interleave(
    ref_seq.squeeze(dim=1),
    repeats=int(L),
    dim=0,
)

然而，这感觉像是一个错误，或者至少是 pytorch 缺少的一个功能：为什么在 GPU 上会inp_seq.shape产生 CPU 张量inp_seq？我目前正在使用torch 1.8.1+cu101

pytorch - 尽管模型和输入都在同一设备上，但 Torchscript 跟踪“必须在当前设备上”错误

1 回答 1

Related

Reference