c++ - 带有 Cuda 的 LibTorch(C++) 引发异常

Question

我正在尝试使用 Cuda 10.1 和 Windows 10 使用 LibTorch 1.3 和 C++ 创建 NN。对于构建，我使用的是 Visual Studio 2019。

到目前为止，我尝试了基本示例和MNIST 示例，其中 CPU 正在工作。但是我不能用 CUDA 运行它。我试图将模型移动到 GPU，如此处所述，但它不起作用。

要将模型移动到 GPU 内存，您可以编写 model.to(at::kCUDA);。通过调用 tensor.to(at::kCUDA) 确保模型的输入也存在于 CUDA 内存中，这将在 CUDA 内存中返回一个新的张量。

所以我尝试了简单的

int main(){
    auto net = std::make_shared<Net>();
    net->to(torch::kCUDA); //crashes here
}

然后我尝试将简单的张量移动到 gpu 内存，但它也崩溃了。

#include <torch/torch.h>

int main() 
{
    torch::Tensor a = torch::ones({ 2, 2 }, torch::requires_grad());
    torch::Tensor b = torch::randn({ 2, 2 });
    a.to(torch::kCUDA);    //Here it crashes
    b.to(torch::kCUDA);    //
    auto c = a + b;
}

我得到了：

Exception thrown at 0x00007FFB8263A839 in Resnet50.exe: Microsoft C++ exception: c10::Error at memory location 0x000000E574979F30.
Unhandled exception at 0x00007FFB8263A839 in Resnet50.exe: Microsoft C++ exception: c10::Error at memory location 0x000000E574979F30.

KernelBase.dll在调试模式下，它指向

auto operator()(Parameters... args) -> decltype(std::declval<FuncType>()(std::forward<Parameters>(args)...)) {
  return kernel_func_(std::forward<Parameters>(args)...);
}

使用torch::cuda::is_available()显示它可以找到 cuda 设备。

我对异常没有太多经验。

score 2 · Accepted Answer

嗨，我有同样的问题。我已经通过安装 libtorch 9.2 版解决了这个问题。我已经从这里下载了发布版本https://pytorch.org/ Cuda toolkit 9.2 和 cudnn 9.2。

我正在使用 Visual Studio 2017。

如果您有其他 cuda 版本，我建议从控制面板卸载。

Cuda 工具包https://developer.nvidia.com/cuda-92-download-archive?target_os=Windows&target_arch=x86_64&target_version=10

在 Windows 上安装 cudnn https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-windows我有版本 cudnn-9.2-windows10-x64-v7.6.5.32

在我用这个命令编译项目之后

cmake -DCMAKE_PREFIX_PATH=path\to\libtorch -Ax64 .. cmake --build . --config Release

在我的代码中我能够做到

testModel->to(torch::DeviceType::CUDA);

一定要在 Release 中编译

c++ - 带有 Cuda 的 LibTorch(C++) 引发异常

1 回答 1

Related

Reference