python - Tensorflow sigmoid 持续饱和

Question

所以我有这个环境和奖励设计，我专门设计为接近 -1、0 和 1，所以（据我所知）sigmoid 不会饱和，而且我用 ca -1 保持奖励设计相当简单和1个最终目标的奖励。

所以我使用 DDPG 并且我使用 250 个神经元（变化，测试很多，但现在让我们坚持这个数字）作为我的隐藏层。Lr = 0.001，内存大小 = 300，Gamma = 0.9，Epsilon = 0.18。

所以这是我的演员网络：

    def _build_a(self, s, reuse=None, custom_getter=None):

        trainable = True if reuse is None else False
        with tf.variable_scope('Actor', reuse=reuse, custom_getter=custom_getter):
            net = tf.layers.dense(s, 250, activation=tf.nn.tanh, name='l1', trainable=trainable)
            a = tf.layers.dense(net, 3, name='a', trainable=trainable)

#            return tf.nn.softmax(a , name='scaled_a')
            return tf.nn.sigmoid(a, name='scaled_a')

这是我的评论家网络

def _build_c(self, s, a, reuse=None, custom_getter=None):

    trainable = True if reuse is None else False
    with tf.variable_scope('Critic', reuse=reuse, custom_getter=custom_getter):
        n_l1 = s.shape[1]
        w1_s = tf.get_variable('w1_s', [s.get_shape()[1], n_l1], trainable=trainable)
        w1_a = tf.get_variable('w1_a', [3, n_l1], trainable=trainable)
        b1 = tf.get_variable('b1', [1, n_l1], trainable=trainable)
        net = tf.nn.relu(tf.matmul(s, w1_s) + tf.matmul(a, w1_a) + b1)
        return tf.layers.dense(net, 1, trainable=trainable)  # Q(s,a)

如前所述，我的奖励在 -1 和 1 左右，我的状态如下所示（部分也是热编码的）：

[ 0.          1.          0.         -0.57726974  0.45491466  2.04893833
 -0.7697888  -0.57952472 -0.57726974 -0.44017265 -0.94382348  1.38399613]

我的 td 误差非常低，因为我（认为原因我）预先处理了所有内容，导致值的移动量很低。有人知道为什么我的 sigmoid 会饱和，是我的网络不好还是我的状态不好？我真的很想知道，因为到目前为止我尝试的一切都没有成功。它要么饱和到 1 个动作（3 = action_bound）到 0.999，其余的在 0 左右，要么收敛到全部 0.999，而且我运行了一次，一切都变为 0。目前在最新的 Python 和 Tensorflow 版本中进行编码。

提前感谢您的回答，这对我来说意义重大！

~一月

PS：如果我错过了任何需要的信息，请告诉我。

python - Tensorflow sigmoid 持续饱和

0 回答 0

Related

Reference