我已经用 NVBLAS 编译了 jBLAS 和一个有点 hacky 的解决方案,因为配置脚本没有正确找到库。我像这样手动编辑了configure.out
jBLAS 文件,以包含 NVBLAS 库。
BUILD_TYPE=nvblas
CC=gcc
CCC=c99
CFLAGS=-fPIC -DHAS_CPUID
F77=gfortran
FOUND_JAVA=true
FOUND_NM=true
INCDIRS=-Iinclude -I/usr/lib/jvm/java-11-openjdk-amd64/include -I/usr/lib/jvm/java-11-openjdk-amd64/include/linux
JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
LAPACK_HOME=./lapack-lite-3.1.1
LD=gcc
LDFLAGS=-shared
LIB=lib
LINKAGE_TYPE=static
LOADLIBES=-Wl,-z,muldefs /home/linyi/jblas/lapack-lite-3.1.1/lapack_LINUX.a /usr/local/cuda-11.0/lib64/libnvblas.so.11 /home/linyi/jblas/lapack-lite-3.1.1/blas_LINUX.a -lgfortran
MAKE=make
NM=nm
OS_ARCH=amd64
OS_ARCH_WITH_FLAVOR=amd64/sse3
OS_NAME=Linux
RUBY=ruby
SO=so
然后我运行命令make clean all
,并mvn clean package
按照此处记录。测试成功通过,但程序在退出时导致分段错误。
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running org.jblas.TestEigen
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to '/home/linyi/nvblas.conf'
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.569 sec
Running org.jblas.TestComplexFloat
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.TestDecompose
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec
Running org.jblas.TestBlasDouble
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
Running org.jblas.TestBlasDoubleComplex
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.TestSingular
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec
Running org.jblas.TestDoubleMatrix
Tests run: 37, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.022 sec
Running org.jblas.TestSolve
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.TestBlasFloat
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.TestFloatMatrix
Tests run: 37, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 sec
Running org.jblas.SimpleBlasTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Running org.jblas.ranges.RangeTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running org.jblas.TestGeometry
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.ComplexDoubleMatrixTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
-- org.jblas INFO Deleting /tmp/jblas4383455253907276334/libjblas.so
-- org.jblas INFO Deleting /tmp/jblas4383455253907276334/libjblas_arch_flavor.so
-- org.jblas INFO Deleting /tmp/jblas4383455253907276334
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fa1f6bb96b1, pid=8063, tid=8072
#
# JRE version: OpenJDK Runtime Environment (11.0.8+10) (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1)
# Java VM: OpenJDK 64-Bit Server VM (11.0.8+10-post-Ubuntu-0ubuntu118.04.1, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libcublas.so.11+0xa096b1]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/linyi/jblas/jblas/core.8063)
#
# An error report file with more information is saved as:
# /home/linyi/jblas/jblas/hs_err_pid8063.log
#
# If you would like to submit a bug report, please visit:
# https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#
Aborted (core dumped)
Results :
Tests run: 122, Failures: 0, Errors: 0, Skipped: 0
我决定运行mvn clean package -DskipTests
,因为测试似乎通过了正常,只是程序在终止时导致了分段错误。然而,当我在我的 Java 项目中使用该库时,nvblas.log
发现尽管 NVBLAS 拦截了对 BLAS 例程的调用,但它们实际上是在 CPU 而不是 GPU 上执行的。运行nvprof --print-gpu-summary
我的程序也得出了同样的结论。
#
==7711== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 100.00% 1.8240us 1 1.8240us 1.8240us 1.8240us [CUDA memcpy HtoD]
======== Error: Application received signal 134
内容nvblas.log
如下:
[NVBLAS] Using devices :0
[NVBLAS] Config parsed
[NVBLAS] dgemm[cpu]: ta=N, tb=N, m=1, n=1, k=1
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=24, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=32, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=26, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=22, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=20, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=26, k=28
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=52, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=54, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=L, ta=T, di=N, m=52, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=L, ta=T, di=N, m=54, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=60, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=T, di=U, m=54, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=T, di=U, m=52, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=L, ta=T, di=N, m=60, n=31
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=22, k=28
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=T, di=U, m=60, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=54, n=22
[NVBLAS] dgemm[cpu]: ta=T, tb=N, m=54, n=22, k=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=60, n=28
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=52, n=20
[NVBLAS] dtrmm[cpu]: si=R, up=L, ta=T, di=N, m=54, n=22
[NVBLAS] dgemm[cpu]: ta=T, tb=N, m=60, n=28, k=31
[NVBLAS] dgemm[cpu]: ta=T, tb=N, m=52, n=20, k=31
[NVBLAS] dgemm[cpu]: ta=N, tb=T, m=31, n=54, k=22
. . .
我真的不知道该怎么做,我希望有人可以提供任何建议,这似乎真的很糟糕。