cuda - Why can't I overlap asynchronous memcpy with kernel execution on fermi on win7 and CUDA 5.0?

Question

I cannot even achieve overlapping memcpy and kernel execution with the simpleStreams example in the CUDA SDK, let alone in my own programs. These threads argue it is a problem with the WDDM driver in windows:

and suggest to:

flush the WDDM queue with cudaEventQuery() or cudaEventQuery(). (Does not work).
submit streams in breadth first manner. (Does not work).

This thread argues it is a bug in fermi:

How can I overlap memory transfers and kernel execution in a CUDA application?

This thread:

http://blog.icare3d.org/2010/04/tesla-compute-drivers.html

proposes a solution to mitigate the problems with WDDM on windows. However, it only works for a Tesla card and it requires an additional video card to steer the display, since the proposed drivers are compute-only drivers.

However, none of these threads provide a real solution. I would appreciate it, if NVIDIA could comment on this problem and come up with a solution, since apparently a lot of people are experiencing this problem.

score 2 · Accepted Answer

TL;DR：这个问题是由 Nsight Monitor 中的 WDDM TDR 延迟选项引起的！当设置为 false 时，会出现问题。相反，如果您将 TDR 延迟值设置为一个非常高的数字，并将“启用”选项设置为 true，那么问题就会消失。

阅读下面的其他（旧）步骤，直到我找到上面的解决方案，以及其他一些可能的原因。

我最近才能够主要解决这个问题！我认为它是特定于 windows 和 aero 的。请尝试这些步骤并发布您的结果以帮助他人！我在 GTX 650 和 GT 640 上试过。

在您执行任何操作之前，请考虑同时使用板载 gpu（作为显示器）和离散 gpu（用于计算），因为已验证适用于 windows 的 nvidia 驱动程序存在问题！当您使用板载 gpu 时，所说的驱动程序没有完全加载，因此避免了很多错误。此外，在工作时保持系统响应能力！

确保您的并发问题与旧驱动程序（包括 bios）、错误代码、无法使用的设备等其他问题无关。
转到计算机>属性
选择左侧的高级系统设置
转到高级选项卡
在性能点击设置
在“视觉效果”选项卡中，选择“调整以获得最佳性能”项目符号。

这将禁用航空和几乎所有的视觉效果。如果此配置有效，您可以尝试逐一启用视觉效果框，直到找到导致问题的精确框！

或者，您可以：

右键桌面，选择个性化
从没有航空的基本主题中选择一个主题。

这也可以像上面那样工作，但启用了更多的视觉选项。对于我的两台设备，此设置也适用，所以我保留了它。

请，当您尝试这些解决方案时，请回到这里并发布您的发现！

对我来说，它解决了大多数情况下的问题（我制作的平铺 dgemm），但请注意，我仍然无法正确运行“simpleStreams”并实现并发......

更新：问题已通过新的 Windows 安装完全解决！之前的步骤改善了某些情况下的行为，但全新安装解决了所有问题！

我会尝试找到一种不太激进的方法来解决这个问题，也许只恢复注册表就足够了。

cuda - Why can't I overlap asynchronous memcpy with kernel execution on fermi on win7 and CUDA 5.0?

1 回答 1

Related

Reference