There're slight discrepancies in the documentation Docker deployment for PaddlePaddle as compared to the documentation to manually install PaddlePaddle from source.
The documentation from the Docker deployment states after pulling the container from Docker Hub:
docker pull paddledev/paddle
the environment variables should be set and included in the docker run
, i.e.:
export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:gpu-latest
The export
commands seem to be looking for libcuda*
and libnvidia*
in /usr/lib64/
but in the documentation from source compilation, the location of lib64/
should be in /usr/local/cuda/lib64
.
Regardless, the location of lib64/
can be found with:
cat /etc/ld.so.conf.d/cuda.conf
Additionally, the export command is looking for libnvidia*
which doesn't seem exist anywhere in /usr/local/cuda/
, except for libnvidia-ml.so
:
/usr/local/cuda$ find . -name 'libnvidia*'
./lib64/stubs/libnvidia-ml.so
I suppose the correct files the CUDA_SO
is looking for are
- /usr/local/cuda/lib64/libcudart.so.8.0
- /usr/local/cuda/lib64/libcudart.so.7.5
But is that right? What is the environmental variable(s) for CUDA_SO
to deploy PaddlePaddle with GPU support?
Even after setting the libcudart*
variable, the docker container doesn't seem to find the GPU driver, i.e.:
user0@server1:~/dockdock$ echo CUDA_SO="$(\ls $CUDA_CONFILE/libcuda* | xargs -I{} echo '-v {}:{}')"
CUDA_SO=-v /usr/local/cuda/lib64/libcudadevrt.a:/usr/local/cuda/lib64/libcudadevrt.a
-v /usr/local/cuda/lib64/libcudart.so:/usr/local/cuda/lib64/libcudart.so
-v /usr/local/cuda/lib64/libcudart.so.8.0:/usr/local/cuda/lib64/libcudart.so.8.0
-v /usr/local/cuda/lib64/libcudart.so.8.0.44:/usr/local/cuda/lib64/libcudart.so.8.0.44
-v /usr/local/cuda/lib64/libcudart_static.a:/usr/local/cuda/lib64/libcudart_static.a
user0@ server1:~/dockdock$ export CUDA_SO="$(\ls $CUDA_CONFILE/libcuda* | xargs -I{} echo '-v {}:{}')"
user0@ server1:~/dockdock$ export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
user0@ server1:~/dockdock$ docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:gpu-latest
root@bd25dfd4f824:/# git clone https://github.com/baidu/Paddle paddle
Cloning into 'paddle'...
remote: Counting objects: 26626, done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 26626 (delta 3), reused 0 (delta 0), pack-reused 26603
Receiving objects: 100% (26626/26626), 25.41 MiB | 4.02 MiB/s, done.
Resolving deltas: 100% (18786/18786), done.
Checking connectivity... done.
root@bd25dfd4f824:/# cd paddle/demo/quick_start/
root@bd25dfd4f824:/paddle/demo/quick_start# sed -i 's|--use_gpu=false|--use_gpu=true|g' train.sh
root@bd25dfd4f824:/paddle/demo/quick_start# bash train.sh
I0410 09:25:37.300365 48 Util.cpp:155] commandline: /usr/local/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lr.py --save_dir=./output --trainer_count=4 --log_period=100 --num_passes=15 --use_gpu=true --show_parameter_stats_period=100 --test_all_data_in_one_period=1
F0410 09:25:37.300940 48 hl_cuda_device.cc:526] Check failed: cudaSuccess == cudaStat (0 vs. 35) Cuda Error: CUDA driver version is insufficient for CUDA runtime version
*** Check failure stack trace: ***
@ 0x7efc20557daa (unknown)
@ 0x7efc20557ce4 (unknown)
@ 0x7efc205576e6 (unknown)
@ 0x7efc2055a687 (unknown)
@ 0x895560 hl_specify_devices_start()
@ 0x89576d hl_start()
@ 0x80f402 paddle::initMain()
@ 0x52ac5b main
@ 0x7efc1f763f45 (unknown)
@ 0x540c05 (unknown)
@ (nil) (unknown)
/usr/local/bin/paddle: line 109: 48 Aborted (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
[1]: http://www.paddlepaddle.org/doc/build/docker_install.html
[2]: http://paddlepaddle.org/doc/build/build_from_source.html
How to deploy a PaddlePaddle Docker container with GPU support?
Also, in Chinese: https://github.com/PaddlePaddle/Paddle/issues/1764