pytorch - pytorch中图像的最大池

Question

我正在尝试将 maxpool2d（来自 torch.nn）应用于单个图像（而不是作为 maxpool 层）。这是我现在的代码：

name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))

问题是，这段代码给了我一个非常绿色的图像。我确信问题出在视图的尺寸上，但我无法找到如何将 maxpool 应用于仅一张图像，所以我无法修复它。我正在考虑的图像尺寸是 512x512。view 的论点现在对我来说毫无意义，它只是给出结果的唯一数字......

例如，如果我将 512,512 作为 view 的参数，则会收到以下错误：

RuntimeError: shape '[512, 512]' is invalid for input of size 261120

如果有人能告诉我如何将 maxpool、avgpool 或 minpool 应用于图像并显示结果，我将非常感激！

谢谢（：

score 2 · Accepted Answer

假设您的图像是numpy.array加载时的（请参阅注释以了解每个步骤的说明）：

import numpy as np
import torch

# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))

# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]

# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you    
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]

pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)

# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())

如果您的图像是黑白的，您将需要形状[1, 1, 512, 512]（仅单通道），您不能离开/挤压这些尺寸，它们总是必须在那里torch.nn.Module！

要将张量再次转换为图像，您可以使用类似的步骤：

# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)

# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape  # Shape: [510, 510, 3]

# Cast to numpy and you have your image
final_image = width_height_channels.numpy()

pytorch - pytorch中图像的最大池

1 回答 1

Related

Reference