tensorflow - Tensorflow - 没有为任何变量提供梯度

Question

我正在 Jupyter 上试验一些代码并一直卡在这里。如果我删除以“optimizer = ...”开头的行以及对该行的所有引用，事情实际上工作得很好。但是，如果我将这一行放在代码中，则会出现错误。

我没有在此处粘贴所有其他函数以使代码的大小保持在可读水平。我希望更有经验的人可以立即看到这里有什么问题。

请注意，输入层、2 个隐藏层和输出层中有 5、4、3 和 2 个单元。

代码：

tf.reset_default_graph()

num_units_in_layers = [5,4,3,2]

X = tf.placeholder(shape=[5, 3], dtype=tf.float32)
Y = tf.placeholder(shape=[2, 3], dtype=tf.float32)
parameters = initialize_layer_parameters(num_units_in_layers)
init = tf.global_variables_initializer() 

my_sess = tf.Session()
my_sess.run(init)
ZL = forward_propagation_with_relu(X, num_units_in_layers, parameters, my_sess)
#my_sess.run(parameters)  # Do I need to run this? Or is it obsolete?

cost = compute_cost(ZL, Y, my_sess, parameters, batch_size=3, lambd=0.05)
optimizer =  tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
_ , minibatch_cost = my_sess.run([optimizer, cost], 
                                 feed_dict={X: minibatch_X, 
                                            Y: minibatch_Y})

print(minibatch_cost)
my_sess.close()

错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-321-135b9fc18268> in <module>()
     16 cost = compute_cost(ZL, Y, my_sess, parameters, 3, 0.05)
     17 
---> 18 optimizer =  tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
     19 _ , minibatch_cost = my_sess.run([optimizer, cost], 
     20                                  feed_dict={X: minibatch_X, 

~/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py in minimize(self, loss, global_step, var_list, gate_gradients, aggregation_method, colocate_gradients_with_ops, name, grad_loss)
    362           "No gradients provided for any variable, check your graph for ops"
    363           " that do not support gradients, between variables %s and loss %s." %
--> 364           ([str(v) for _, v in grads_and_vars], loss))
    365 
    366     return self.apply_gradients(grads_and_vars, global_step=global_step,

ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'weights/W1:0' shape=(4, 5) dtype=float32_ref>", "<tf.Variable 'biases/b1:0' shape=(4, 1) dtype=float32_ref>", "<tf.Variable 'weights/W2:0' shape=(3, 4) dtype=float32_ref>", "<tf.Variable 'biases/b2:0' shape=(3, 1) dtype=float32_ref>", "<tf.Variable 'weights/W3:0' shape=(2, 3) dtype=float32_ref>", "<tf.Variable 'biases/b3:0' shape=(2, 1) dtype=float32_ref>"] and loss Tensor("Add_3:0", shape=(), dtype=float32).

请注意，如果我运行

print(tf.trainable_variables())

就在“优化器 = ...”行之前，我实际上在那里看到了我的可训练变量。

hts/W1:0' shape=(4, 5) dtype=float32_ref>, <tf.Variable 'biases/b1:0' shape=(4, 1) dtype=float32_ref>, <tf.Variable 'weights/W2:0' shape=(3, 4) dtype=float32_ref>, <tf.Variable 'biases/b2:0' shape=(3, 1) dtype=float32_ref>, <tf.Variable 'weights/W3:0' shape=(2, 3) dtype=float32_ref>, <tf.Variable 'biases/b3:0' shape=(2, 1) dtype=float32_ref>]

有人会知道可能是什么问题吗？

编辑和添加更多信息：如果您想看看我如何创建和初始化我的参数，这里是代码。也许这部分有什么问题，但我不明白是什么..

def get_nn_parameter(variable_scope, variable_name, dim1, dim2):
  with tf.variable_scope(variable_scope, reuse=tf.AUTO_REUSE):
    v = tf.get_variable(variable_name, 
                        [dim1, dim2], 
                        trainable=True, 
                        initializer = tf.contrib.layers.xavier_initializer())
  return v


def initialize_layer_parameters(num_units_in_layers):
    parameters = {}
    L = len(num_units_in_layers)

    for i in range (1, L):
        temp_weight = get_nn_parameter("weights",
                                       "W"+str(i), 
                                       num_units_in_layers[i], 
                                       num_units_in_layers[i-1])
        parameters.update({"W" + str(i) : temp_weight})  
        temp_bias = get_nn_parameter("biases",
                                     "b"+str(i), 
                                     num_units_in_layers[i], 
                                     1)
        parameters.update({"b" + str(i) : temp_bias})  

    return parameters

#

附录

我让它工作了。我没有写一个单独的答案，而是在这里添加我的代码的正确版本。

（大卫在下面的回答很有帮助。）

我只是删除了 my_sess 作为我的 compute_cost 函数的参数。（我以前无法让它工作，但似乎根本不需要它。）我还在我的主函数中重新排序了语句，以便以正确的顺序调用事物。

这是我的成本函数的工作版本以及我如何称呼它：

def compute_cost(ZL, Y, parameters, mb_size, lambd):

    logits = tf.transpose(ZL)
    labels = tf.transpose(Y)

    cost_unregularized = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = logits, labels = labels))

    #Since the dict parameters includes both W and b, it needs to be divided with 2 to find L
    L = len(parameters) // 2

    list_sum_weights = []

    for i in range (0, L):
        list_sum_weights.append(tf.nn.l2_loss(parameters.get("W"+str(i+1))))

    regularization_effect = tf.multiply((lambd / mb_size), tf.add_n(list_sum_weights))
    cost = tf.add(cost_unregularized, regularization_effect)

    return cost

这是我调用 compute_cost(..) 函数的主要函数：

tf.reset_default_graph()

num_units_in_layers = [5,4,3,2]

X = tf.placeholder(shape=[5, 3], dtype=tf.float32)
Y = tf.placeholder(shape=[2, 3], dtype=tf.float32)
parameters = initialize_layer_parameters(num_units_in_layers)

my_sess = tf.Session()
ZL = forward_propagation_with_relu(X, num_units_in_layers, parameters)

cost = compute_cost(ZL, Y, parameters, 3, 0.05)
optimizer =  tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
init = tf.global_variables_initializer() 

my_sess.run(init)
_ , minibatch_cost = my_sess.run([optimizer, cost], 
                                 feed_dict={X: [[-1.,4.,-7.],[2.,6.,2.],[3.,3.,9.],[8.,4.,4.],[5.,3.,5.]], 
                                            Y: [[0.6, 0., 0.3], [0.4, 0., 0.7]]})


print(minibatch_cost)

my_sess.close()

score 3 · Accepted Answer

我 99.9% 确定您错误地创建了成本函数。

cost = compute_cost(ZL, Y, my_sess, parameters, batch_size=3, lambd=0.05)

您的成本函数应该是张量。您正在将会话传递给成本函数，看起来它实际上是在尝试运行严重错误的 tensorflow 会话。

然后稍后您将结果传递compute_cost给您的最小化器。

这是对 tensorflow 的常见误解。

Tensorflow 是一种声明式编程范式，这意味着您首先声明要运行的所有操作，然后再将数据传入并运行它。

重构您的代码以严格遵循此最佳实践：

(1) 创建一个build_graph()函数，所有的数学运算都应该放在这个函数中。您应该定义成本函数和网络的所有层。返回optimize.minimize()训练操作（以及您可能想要返回的任何其他操作，例如准确性）。

(2) 现在创建一个会话。

(3) 在此之后不要再创建任何 tensorflow 操作或变量，如果你觉得你需要做错事的话。

(4) 在您的 train_op 上调用 sess.run，并通过 . 传入占位符数据feed_dict。

这是一个如何构建代码的简单示例：

https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/neural_network_raw.ipynb

一般来说，aymericdamien 提出了非常好的示例，我强烈建议您查看它们以了解 tensorflow 的基础知识。

tensorflow - Tensorflow - 没有为任何变量提供梯度

1 回答 1

Related

Reference