c++ - GPU 上的 BFCAllocator 真的支持在 GPU 执行之前进行主机端释放吗？

Question

当我在 TF 中阅读 XLA 的代码时，我遇到了以下代码片段。

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/jit/xla_launch_util.h#L119

// Adapter class that wraps a Tensorflow allocator as an XLA allocator.
// Assumes that the Tensorflow allocator permits asynchronous deallocation:
// see comment on `AllowsAsynchronousDeallocation()`.
class XlaAllocator : public xla::DeviceMemoryAllocator {
 public:
  XlaAllocator(const se::Platform* platform, Allocator* wrapped);
  ~XlaAllocator() override;
  xla::StatusOr<xla::OwningDeviceMemory> Allocate(
      int device_ordinal, uint64 size, bool retry_on_failure) override;
  Status Deallocate(int device_ordinal, se::DeviceMemoryBase mem) override;

  // The Tensorflow BFC allocator used on GPU allows host-side deallocation
  // before GPU execution takes place. Tensorflow uses the ordering of the main
  // compute stream to enforce a happens-before relationship between a memory
  // allocation and code that reuses the same memory. If Tensorflow adds
  // support for multiple GPU streams or allocators with different ordering
  // requirements, this code may need to change.
  // (This attribute has no effect on CPU.)
  bool AllowsAsynchronousDeallocation() const override { return true; }

 private:
  Allocator* wrapped_;
};

如评论中所示，它表示 GPU 上使用的 Tensorflow BFC 分配器允许在 GPU 执行之前进行主机端释放。

这真的让我很困惑。在深入研究了 BFCAllocator 的代码后，我没有发现任何支持这一点的东西。

所以我的问题是：

Tensorflow BFC 分配器真的支持从主机端进行异步释放吗？
如果这是真的，那我错过了什么？

谢谢！

c++ - GPU 上的 BFCAllocator 真的支持在 GPU 执行之前进行主机端释放吗？

0 回答 0

Related

Reference