python - OpenAI Gradient Checkpointing with Tensorflow Eager Execution

Question

我最近切换到 Tensorflow Eager（目前正在使用 TF 1.8.0）并且非常喜欢它。但是，我现在有一个相当大的模型，当使用计算 TF 中的梯度所需的梯度磁带运行时，它不适合我的 GPU 内存（GTX 1080Ti，12GB VRAM）。前传（即不使用渐变胶带）工作正常。

我考虑过使用 OpenAI 的Gradient Checkpointing，希望这会有所帮助。但是，在他们的 Git 中描述的简单使用它似乎对 Eager Execution 没有帮助，即

import tensorflow as tf
import tensorflow.contrib.eager as tfe
import memory_saving_gradients
tf.__dict__["gradients"] = memory_saving_gradients.gradients_memory
# using gradients_memory or gradients_speed does not change anything
# tf.__dict__["gradients"] = memory_saving_gradients.gradients_speed

[...]
with tfe.GradientTape() as g:
    output = run_large_model()
    loss = calculate_loss_on_output(output)
grads = g.gradient(full, model.variables)
optimizer.apply_gradients(zip(grads, model.variables))

内存不足，与是否使用梯度检查点无关。

我的猜测是梯度磁带仍然存储所有变量和向后传递所需的信息，并且梯度检查点没有效果，因为 Eager 模式下的 TF 实际上并没有构建图形（据我了解 - 或者至少它是不同的图形）。

您是否有任何经验或任何想法如何解决这个问题，或者我需要做什么才能在 TF Eager 模式下使用梯度检查点？

score 4 · Accepted Answer

openai 的梯度检查点代码基于图形重写，因此不支持 Eager Execution。

tensorflow.contrib.layers 库有一个recompute_grad装饰器，它是等效的，但在图形和急切执行中都受支持。

python - OpenAI Gradient Checkpointing with Tensorflow Eager Execution

1 回答 1

Related

Reference