encog - 运行时如何修复encog“内核启动失败”错误：“./encog benchmark /gpu:1”

Question

作为 encog 安装测试的一部分，我尝试运行 ./encog benchmark /gpu:0，效果很好，但是当我尝试时 ./encog benchmark /gpu:1，我得到：

encog-core/cuda_eval.cu(286) : getLastCudaError() CUDA error : kernel launch failure : (13) invalid device symbol.

我在 Ubuntu 11.10 上，我从https://github.com/encog/encog-c获得源代码，并且“make ARCH=64 CUDA=1”没有错误。

感谢您为解决此问题提供的任何帮助。

这是运行良好的基准测试的控制台列表：

rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$ ./encog benchmark /gpu:0

* * Encog C/C++ (64 bit, CUDA) Command Line v1.0 * *
Copyright 2012 by Heaton Research, Released under the Apache License
Build Date: May 4 2013 07:24:00
Processor/Core Count: 32
Basic Data Type: double (64 bits)
GPU: disabled
Input Count: 10
Ideal Count: 1
Records: 10000
Iterations: 100

Performing benchmark...please wait
Benchmark time(seconds): 3.2856
Benchmark time includes only training time.

Encog Finished. Run time 00:00:03.2904

==============================================

这是有问题的基准测试

rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$ ./encog benchmark /gpu:1

* * Encog C/C++ (64 bit, CUDA) Command Line v1.0 * *
Copyright 2012 by Heaton Research, Released under the Apache License
Build Date: May 4 2013 07:24:00
Processor/Core Count: 32
Basic Data Type: double (64 bits)
GPU: enabled
Input Count: 10
Ideal Count: 1
Records: 10000
Iterations: 100

Performing benchmark...please wait
encog-core/cuda_eval.cu(286) : getLastCudaError() CUDA error : kernel launch failure : (13) invalid device symbol.

===========================================

这是我的 GPU 环境的样子：

rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$ ./encog cuda

* * Encog C/C++ (64 bit, CUDA) Command Line v1.0 * *
Copyright 2012 by Heaton Research, Released under the Apache License
Build Date: May 4 2013 07:24:00
Processor/Core Count: 32
Basic Data Type: double (64 bits)
GPU: enabled
Device 0: GeForce GTX 690
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)

( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock Speed: 1.02 GHz
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Device 1: GeForce GTX 690
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)

( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock Speed: 1.02 GHz
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Device 2: GeForce GTX 690
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)

( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock Speed: 1.02 GHz
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Device 3: GeForce GTX 690
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)

( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock Speed: 1.02 GHz
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Device 4: GeForce GTX 690
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)

( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock Speed: 1.02 GHz
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Device 5: GeForce GTX 690
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)

( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock Speed: 1.02 GHz
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Device 6: GeForce GTX 690
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)

( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock Speed: 1.02 GHz
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Device 7: GeForce GTX 690
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)

( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock Speed: 1.02 GHz
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Performing CUDA test.
Vector Addition
CUDA Vector Add Test was successful.
Encog Finished. Run time 00:00:10.9206

================================

这是我的“make”的输出：

rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$ make ARCH=64 CUDA=1
mkdir -p ./obj-cmd
gcc -c -o obj-cmd/encog-cmd.o encog-cmd/encog-cmd.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-cmd
gcc -c -o obj-cmd/cuda_test.o encog-cmd/cuda_test.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-cmd
gcc -c -o obj-cmd/node_unix.o encog-cmd/node_unix.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-cmd
/usr/local/cuda/bin/nvcc -o obj-cmd/cuda_vecadd.cu.o -c encog-cmd/cuda_vecadd.cu -I./encog-core/ -m64
mkdir -p ./obj-lib
gcc -c -o obj-lib/activation.o encog-core/activation.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/errorcalc.o encog-core/errorcalc.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/network_io.o encog-core/network_io.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/util.o encog-core/util.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/util_str.o encog-core/util_str.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/data.o encog-core/data.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/errors.o encog-core/errors.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/network.o encog-core/network.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/pso.o encog-core/pso.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/util_file.o encog-core/util_file.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/vector.o encog-core/vector.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/encog.o encog-core/encog.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/nm.o encog-core/nm.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/object.o encog-core/object.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/rprop.o encog-core/rprop.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/hash.o encog-core/hash.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
gcc -c -o obj-lib/train.o encog-core/train.c -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include
mkdir -p ./obj-lib
/usr/local/cuda/bin/nvcc -o obj-lib/encog_cuda.cu.o -c encog-core/encog_cuda.cu -I./encog-core/ -m64
mkdir -p ./obj-lib
/usr/local/cuda/bin/nvcc -o obj-lib/cuda_eval.cu.o -c encog-core/cuda_eval.cu -I./encog-core/ -m64
ptxas /tmp/tmpxft_00001b04_00000000-5_cuda_eval.ptx, line 141; warning : Double is not supported. Demoting to float
mkdir -p ./lib
ar rcs ./lib/encog.a ./obj-lib/activation.o ./obj-lib/errorcalc.o ./obj-lib/network_io.o ./obj-lib/util.o ./obj-lib/util_str.o ./obj-lib/data.o ./obj-lib/errors.o ./obj-lib/network.o ./obj-lib/pso.o ./obj-lib/util_file.o ./obj-lib/vector.o ./obj-lib/encog.o ./obj-lib/nm.o ./obj-lib/object.o ./obj-lib/rprop.o ./obj-lib/hash.o ./obj-lib/train.o ./obj-lib/encog_cuda.cu.o ./obj-lib/cuda_eval.cu.o
gcc -o encog obj-cmd/encog-cmd.o obj-cmd/cuda_test.o obj-cmd/node_unix.o obj-cmd/cuda_vecadd.cu.o lib/encog.a -I./encog-core/ -fopenmp -std=gnu99 -pedantic -O3 -Wall -m64 -DENCOG_CUDA=1 -I/usr/local/cuda/include -lm ./lib/encog.a -L/usr/local/cuda/lib64 -lcudart
rick@rick-cuda:~/a01-neuralnet-encog/encog-c-master$

score 1 · Accepted Answer

我尝试在我的 GeForce 580 上运行它，没有问题。我和你在不同的平台上，因为你是 6 系列。我在谷歌的几个地方查找了错误。看起来本地内存的使用方式可能存在问题，可能不适用于 6 系列。可能想在这里提交问题：

https://github.com/encog/encog-c/issues

encog - 运行时如何修复encog“内核启动失败”错误：“./encog benchmark /gpu:1”

感谢您为解决此问题提供的任何帮助。

这是运行良好的基准测试的控制台列表：

这是有问题的基准测试

这是我的 GPU 环境的样子：

这是我的“make”的输出：

1 回答 1

Related

Reference