python - 如何在 Tensorflow 2.0 中使用 Google Colab 的 TPU？

Question

我正在尝试利用Google Colab使用张量处理单元 (TPU) 来训练神经网络。Tensorflow 刚刚发布了一个主要版本 2.0，所以我试图在 Tensorflow 2.0 中实现这一点。我尝试了以下三个指南，但所有指南都是为 Tensorflow 1.14 编写的，并且在 Tensorflow 2.0 中失败：

1)按照Colab 中的 TPU指南，我收到错误消息：

AttributeError: module 'tensorflow' has no attribute 'Session'

（来自参考：使用 tf.Session(tpu_address) 作为会话：）

2）按照指南Simple Classification Model using Keras on Colab TPU，我得到了同样的错误

3）按照指南cloud_tpu_custom_training，我得到错误：

AttributeError: module 'tensorflow' has no attribute 'contrib'

（来自参考：resolver = tf.contrib.cluster_resolver.TPUClusterResolver(tpu=TPU_WORKER)）

有没有人有使用 TPU 在 Tensorflow 2.0 中训练神经网络的示例？

编辑：这个问题似乎也出现在 github 上：InvalidArgumentError: Unable to find a context_id matching the specified one #1

score 1 · Accepted Answer

最后，在 Tensorflow 2.1.0 中添加了对 TPU 的支持（截至 2020 年 1 月 8 日）。从这里的发行说明https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0：

对 Keras .compile、.fit、.evaluate 和 .predict 的实验性支持适用于 Cloud TPU、Cloud TPU，适用于所有类型的 Keras 模型（顺序、功能和子类模型）。

该教程可在此处获得：https ://www.tensorflow.org/guide/tpu

为了完整起见，我将在此处添加演练：

转到 Google Colab 并在此处创建一个新的 Python 3 Notebook：https ://colab.research.google.com/
在工具栏中，单击运行时/更改运行时类型，然后在硬件加速器下选择“TPU”。
将以下代码复制并粘贴到笔记本中，然后单击运行单元（播放按钮）。

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import os
import tensorflow_datasets as tfds

# Distribution strategies
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)

# MNIST model
def create_model():
  return tf.keras.Sequential(
      [tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
       tf.keras.layers.Flatten(),
       tf.keras.layers.Dense(128, activation='relu'),
       tf.keras.layers.Dense(10)])

# Input datasets
def get_dataset(batch_size=200):
  datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True,
                             try_gcs=True)
  mnist_train, mnist_test = datasets['train'], datasets['test']

  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.0

    return image, label

  train_dataset = mnist_train.map(scale).shuffle(10000).batch(batch_size)
  test_dataset = mnist_test.map(scale).batch(batch_size)

  return train_dataset, test_dataset

# Create and train a model
strategy = tf.distribute.experimental.TPUStrategy(resolver)
with strategy.scope():
  model = create_model()
  model.compile(optimizer='adam',
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])

train_dataset, test_dataset = get_dataset()

model.fit(train_dataset,
          epochs=5,
          validation_data=test_dataset,steps_per_epoch=50)

请注意，当我按原样运行 tensorflow 教程中的代码时，会出现以下错误。我已经通过在 model.fit() 中添加 steps_per_epoch 参数来纠正这个问题

ValueError：无法从数据中推断出步数，请传递 steps_per_epoch 参数。

score 1 · Accepted Answer

首先，教程中给出的代码与 2.x 不兼容

在colab中需要选择runtime作为TPU才能在TPU中执行代码
对于错误

AttributeError：模块“张量流”没有属性“会话”

您需要tf.compat.v1.Session()按tf.session已弃用的方式使用。
代替tf.contrib.cluster_resolver请使用tf.distribute.cluster_resolver

请参考 Tensorflow Addon-repo将代码从 1.x 转换为 2.x 兼容。

score 0 · Accepted Answer

0

“tf”的升级版本将解决上述问题。

!pip install tensorflow==2.7.0

于 2021-11-28T09:24:14.253 回答

score 0 · Accepted Answer

在运行代码之前，

去，

Edit --> Notebook Settings

在那个选择下

Hardware Accelerator --> TPU

score -1 · Accepted Answer

Tensorflow 2.0 并不真正向后兼容 Tensorflow 1.X 代码。Tensorflow 在这些版本之间的工作方式有很多变化，因此我强烈建议您阅读有关如何迁移代码的官方指南：

https://www.tensorflow.org/guide/migrate#estimators

我会说，自动转换脚本虽然在技术上是成功的，但只是将我的代码更改为 Tensorflow 1.X 代码的兼容版本——如果您想使用任何实际的 Tensorflow 2.0 功能，您可能需要手动更改代码.

python - 如何在 Tensorflow 2.0 中使用 Google Colab 的 TPU？

5 回答 5

Related

Reference