tensorflow - GPU 上的 WinML 推理时间比 Tensorflow python 慢 3 倍

Question

我尝试在 WinML 中使用在 python 上训练的 tensorflow 模型。我成功地将 protobuf 转换为 onnx。获得以下性能结果：

WinML 43s
OnnxRuntime 10s
TensorFlow 12s

对 CPU 的推断大约需要 86 秒。

在性能工具上，与其他工具相比，WinML 似乎没有正确使用 GPU。WinML 似乎使用 DirectML 作为后端（我们在 Nvidia GPU 分析器上观察到 DML 前缀）。是否可以将 Cuda 推理引擎与 WinML 一起使用？有没有人观察到类似的结果，WinML 在 GPU 上异常缓慢？

score 2 · Accepted Answer

我得到了一些关于这个 WinML 性能的答案。我的网络使用仅在 Windows 2004 中受 DirectML 支持的 LeakyRelu。在 Windows 以前的版本中，此问题禁用 DirectML Metacommand 的使用，因此性能不佳。在新的 Windows 版本中，我在 WinML 上获得了良好的性能。

tensorflow - GPU 上的 WinML 推理时间比 Tensorflow python 慢 3 倍

1 回答 1

Related

Reference