我正在根据本教程使用 PyTorch 微调 Faster-RCNN:https ://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
结果非常好,但只有在向模型提供单个张量时才能进行预测。例如:
# This works well
>>> img, _ = dataset_test[3]
>>> img.shape
torch.Size([3, 1200, 1600])
>>> model.eval()
>>> with torch.no_grad():
.. preds = model([img.to(device)])
但是当我一次输入多个张量时,我得到了这个错误:
>>> random_idx = torch.randint(high=50, size=(4,))
>>> images = torch.stack([dataset_test[idx][0] for idx in random_idx])
>>> images.shape
torch.Size([4, 3, 1200, 1600])
>>> with torch.no_grad():
.. preds = model(images.to(device))
RuntimeError Traceback (most recent call last)
<ipython-input-101-52caf8fee7a4> in <module>()
5 model.eval()
6 with torch.no_grad():
----> 7 prediction = model(images.to(device))
...
RuntimeError: The expanded size of the tensor (1600) must match the existing size (1066) at non-singleton dimension 2. Target sizes: [3, 1200, 1600]. Tensor sizes: [3, 800, 1066]
编辑
在提供 3D 张量列表时工作(IMO 这种行为有点奇怪,我不明白为什么它不适用于 4D 张量):
>>> random_idx = torch.randint(high=50, size=(4,))
>>> images = [dataset_test[idx][0].to(device) for idx in random_idx]
>>> images.shape
torch.Size([4, 3, 1200, 1600])
>>> with torch.no_grad():
.. preds = model(images)