tensorflow - 展开 Tensorflow 的循环以避免频繁的 GPU 内核开销

Question

考虑下面的程序，我对它b += a进行了多次迭代计算。

A = tf.constant(np.random.randn(1000000))
B = tf.constant(np.random.randn(1000000))

init = tf.global_variables_initializer()
with tf.Session() as sess:
  sess.run(init)
  for i in range(100):
    B = tf.add(A, B)

显然，上面的循环调用了至少 100 次内核启动，这看起来是不必要的，因为我实际上是在原地做这个添加。有什么办法可以避免内核启动开销？理想情况下，我正在寻找一种 tensorflow API 解决方案（只有一次调用run），而不是改变B += a.

score 0 · Accepted Answer

你可以使用tf.while_loop.

i = tf.constant(100)
op = tf.while_loop(
  lambda a, b, i: tf.greater(i, 0),
  lambda a, b, i: (a+b, b, i-1),
  (a, b, i))
res = op[0]

res包含运行循环后的“值a”——注意它a实际上没有改变并且仍然包含起始值。

score 0 · Accepted Answer

您需要了解 TF 的第一件事是您需要将图的定义与其执行分开。当您处理实际问题时，它将为您节省数小时的调试/搜索效率低下的时间。

您当前的问题是因为您还没有这样做。在您的循环中，您每次（100 次）创建一个图形节点。如果你想 - 如果你很懒，请检查你的张量板图，只需将值增加到一个非常大的值，你的程序就会崩溃，比如graph is bigger than 2Gb.

要以更好的方式执行此操作，您需要定义然后执行。要重新分配值，请使用分配运算符。

A = tf.constant(3) # change to your random stuff
B = tf.Variable(1) # change to your random stuff
B_new = B.assign(tf.add(A, B)) 

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(100):
        res = sess.run(B_new)
    print res

最后你显然不需要循环：

A = tf.constant(3)
B = tf.constant(1)
C = 100 * A + B

with tf.Session() as sess:
    print sess.run(C)

score 0 · Accepted Answer

基本上，您在图表中创建了 100 个分配和添加操作，这可能不是您想要的。

这段代码应该做你想做的。

    A = tf.constant(np.random.randn(1000000))
    # B has to be a variable so we can assign to it
    B = tf.Variable(np.random.randn(1000000))

    # Add the assign and addition operators to the graph
    assign_to_B_op = B.assign(tf.add(A, B)) 

    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        # To ensure we don't add new ops to the Graph by mistake.
        sess.graph.finalize()
        sess.run(init)
        for i in range(100):
            sess.run(assign_to_B_op)
            print(B.eval())

tensorflow - 展开 Tensorflow 的循环以避免频繁的 GPU 内核开销

3 回答 3

Related

Reference