python - TensorFlow 中 Max Pooling 2D 层的输出张量是多少？

Question

我试图了解有关 tensorflow 的一些基础知识，但在阅读最大池化 2D 层的文档时遇到了困难：https ://www.tensorflow.org/tutorials/layers#pooling_layer_1

这是指定 max_pooling2d 的方式：

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

其中conv1有一个形状为的张量[batch_size, image_width, image_height, channels]，在这种情况下具体为[batch_size, 28, 28, 32].

所以我们的输入是一个形状为：的张量[batch_size, 28, 28, 32]。

我对最大池化 2D 层的理解是它将应用大小pool_size（在本例中为 2x2）的过滤器并将滑动窗口移动stride（也是 2x2）。这意味着图像的width和height都将减半，即我们最终将得到每个通道 14x14 像素（总共 32 个通道），这意味着我们的输出是一个形状为：的张量[batch_size, 14, 14, 32]。

但是，根据上面的链接，输出张量的形状是[batch_size, 14, 14, 1]：

Our output tensor produced by max_pooling2d() (pool1) has a shape of 
[batch_size, 14, 14, 1]: the 2x2 filter reduces width and height by 50%.

我在这里想念什么？

32 是如何转换为 1 的？

他们稍后在这里应用相同的逻辑：https ://www.tensorflow.org/tutorials/layers#convolutional_layer_2_and_pooling_layer_2

但这一次是正确的，即[batch_size, 14, 14, 64]变为[batch_size, 7, 7, 64]（通道数相同）。

score 3 · Accepted Answer

是的，使用 strides=2x2 的 2x2 max pool 会将数据减少一半，并且输出深度不会改变。这是我给定的测试代码，输出形状是(14, 14, 32)，也许有问题？

#!/usr/bin/env python

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('./MNIST_data/', one_hot=True)

conv1 = tf.placeholder(tf.float32, [None,28,28,32])
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2,2], strides=2)
print pool1.get_shape()

输出是：

Extracting ./MNIST_data/train-images-idx3-ubyte.gz
Extracting ./MNIST_data/train-labels-idx1-ubyte.gz
Extracting ./MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ./MNIST_data/t10k-labels-idx1-ubyte.gz
(?, 14, 14, 32)

score 1 · Accepted Answer

尼古拉，它已经按照你的想法得到纠正。

学习卷积和池化的概念，我遇到了这个线程。感谢您的问题，这将我带到了信息丰富的文档。

python - TensorFlow 中 Max Pooling 2D 层的输出张量是多少？

2 回答 2

Related

Reference