u-net的输入图像大小为572*572,但输出掩码大小为388*388。图像如何被较小的蒙版蒙版?
1 回答
Probably you are referring to the scientific paper by Ronneberger et al in which the U-Net architecture was published. There the graph shows these numbers.
The explanation is a bit hidden in section "3. Training" of the paper:
Due to the unpadded convolutions, the output image is smaller than the input by a constant border width.
This means that during each convolution, part of the image is "cropped" since the convolution will start in a coordinate so that it fully overlaps with the input-image / input-blob of the layer. In case of 3x3 convolutions, this is always one pixel at each side. For more a visual explanation of kernels/convolutions see e.g. here. The output is smaller because due to the cropping occuring during unpadded convolutions only (the inner) part of the image gets a result.
It is not a general characteristic of the architecture, but something inherent to (unpadded) convolutions and can be avoided with padding. Probably the most common strategy is mirroring at the image borders, so that each convolution can start at the very edge of an image (and sees mirrored pixels in places where it's kernel overlaps). Then the input size can be preserved and the full image will be segmented.