一般来说,有一些很好的例子使用 TF 优化器来解决一般(非深度学习)问题。鉴于:
https://databricks.com/tensorflow/training-and-convergence https://colab.research.google.com/notebooks/tpu.ipynb#scrollTo=a_rjVo-RAoYd
我们希望能够将上述两者结合起来,并利用基于 TPU 的优化来解决高维问题。
为此,我有一个简单的 colab 代码,它合并了上面的两个示例:
import tensorflow as tf
import numpy as np
from tensorflow.contrib.tpu.python.tpu import tpu_function
import os
import pprint
import tensorflow as tf
if 'COLAB_TPU_ADDR' not in os.environ:
print('ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!')
else:
tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print ('TPU address is', tpu_address)
with tf.Session(tpu_address) as session:
devices = session.list_devices()
print('TPU devices:')
pprint.pprint(devices)
# Add this somewhere at the top
tpu_function.get_tpu_context().set_number_of_shards(8)
# x and y are placeholders for our training data
x = tf.placeholder("float")
y = tf.placeholder("float")
# w is the variable storing our values. It is initialised with starting "guesses"
# w[0] is the "a" in our equation, w[1] is the "b"
w = tf.Variable([1.0, 2.0,3.0, 4.0], name="w")
# Our model of y = a*x + b
y_model = tf.multiply(x, w[0]) + w[1] + w[2] +3
# Our error is defined as the square of the differences
error = tf.square(y - y_model)
# The Gradient Descent Optimizer does the heavy lifting
train_op = tf.train.AdamOptimizer(0.01)
optimizer = tf.contrib.tpu.CrossShardOptimizer(train_op).minimize(error) # TPU change 1
# Normal TensorFlow - initialize values, create a session and run the model
model = tf.global_variables_initializer()
with tf.Session(tpu_address) as session:
session.run(tf.contrib.tpu.initialize_system())
print('init')
session.run(model)
for i in range(10000):
print(i)
x_value = np.random.rand()
y_value = x_value * 2 + 6 + 5 + 3
session.run(optimizer, feed_dict={x: x_value, y: y_value})
w_value = session.run(w)
print("Predicted model: {a:.3f}x + {b:.3f}+{c:.3f}x + {d:.3f}".format(a=w_value[0], b=w_value[1], c=w_value[2], d=w_value[3]))
session.run(tpu.shutdown_system())
当我运行它(在 colab 中)时,它只是运行第一个循环打印:
init
0
然后什么也不做,colab 只是不断跨越。
如果我不使用
optimizer = tf.contrib.tpu.CrossShardOptimizer(train_op).minimize(error)
和其他 TPU 功能,然后它可以很好地估计w
变量。
问题是:
- 为什么这不起作用,我们如何让跨分片复制器优化这个简单的功能?
- 我应该如何塑造变量
w
以利用 TPU 上的并行批次/分片? - 我们如何通过使用等效的数据集
prefetch
操作或使用输入队列来提高效率?
目标是在lower level
没有 TPUEstimator 的情况下使用 TPU API,例如通过仅使用张量、队列和分片来利用 TPU 的强大功能来帮助解决自定义问题。