4

我做了两个安装:

  1. brew install numpy(和 scipy)--with-openblas
  2. 克隆 GIT 存储库(用于 numpy 和 scipy)并自己构建

在我克隆了两个方便的脚本以在多线程环境中验证这些库之后:

git clone https://gist.github.com/3842524.git

然后对于我正在执行的每个安装show_config

python -c "import scipy as np; np.show_config()"

安装 1 一切都很好:

lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/opt/openblas/lib']
    language = f77
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/opt/openblas/lib']
    language = f77
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/opt/openblas/lib']
    language = f77
blas_mkl_info:
    NOT AVAILABLE

但是安装 2 的事情就不那么亮了:

lapack_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    extra_compile_args = ['-msse3']
    define_macros = [('NO_ATLAS_INFO', 3)]
blas_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    extra_compile_args = ['-msse3', '- I/System/Library/Frameworks/vecLib.framework/Headers']
define_macros = [('NO_ATLAS_INFO', 3)]

因此,似乎当我未能正确链接 OpenBLAS 时。但是现在还好,这里是性能结果。所有测试均在 iMac、Yosemite、i7-4790K、4 核、超线程上进行。

首次安装 OpenBLAS:

麻木:

OMP_NUM_THREADS=1 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.126578998566 sec

OMP_NUM_THREADS=2 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.0640147686005 sec

OMP_NUM_THREADS=4 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.0360922336578 sec

OMP_NUM_THREADS=8 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.0364527702332 sec

密码:

OMP_NUM_THREADS=1 python test_scipy.py
cholesky: 0.0276656150818 sec
svd: 0.732437372208 sec

OMP_NUM_THREADS=2 python test_scipy.py
cholesky: 0.0182101726532 sec
svd: 0.441690778732 sec

OMP_NUM_THREADS=4 python test_scipy.py
cholesky: 0.0130400180817 sec
svd: 0.316107988358 sec

OMP_NUM_THREADS=8 python test_scipy.py
cholesky: 0.012854385376 sec
svd: 0.315939807892 sec

没有 OpenBLAS 的第二次安装:

麻木:

OMP_NUM_THREADS=1 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0371072292328 sec

OMP_NUM_THREADS=2 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0215149879456 sec

OMP_NUM_THREADS=4 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0146862030029 sec

OMP_NUM_THREADS=8 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0141334056854 sec

密码:

OMP_NUM_THREADS=1 python test_scipy.py
cholesky: 0.0109382152557 sec
svd: 0.32529540062 sec

OMP_NUM_THREADS=2 python test_scipy.py
cholesky: 0.00988121032715 sec
svd: 0.331357002258 sec

OMP_NUM_THREADS=4 python test_scipy.py
cholesky: 0.00916676521301 sec
svd: 0.318637990952 sec

OMP_NUM_THREADS=8 python test_scipy.py
cholesky: 0.00931282043457 sec
svd: 0.324427986145 sec

令我惊讶的是,第二种情况比第一种更快。在 scipy 的情况下,添加更多内核后性能没有提高,但即使是一个内核也比 OpenBLAS 中的 4 个内核更快。

有谁知道为什么会这样?

4

1 回答 1

9

There are two obvious differences that might account for the discrepancy:

  1. You are comparing two different versions numpy. The OpenBLAS-linked version you installed using Homebrew is 1.9.1, whereas the one you built from source is 1.10.0.dev0+3c5409e.

  2. Whilst the newer version is not linked against OpenBLAS, it is linked against Apple's Accelerate Framework, a different optimized BLAS implementation.


The reason why your test script still reports slow blas for the second case is due to an incompatibility with the newest versions of numpy. The script you are using tests whether numpy is linked against an optimised BLAS library by checking for the presence of numpy.core._dotblas:

try:
    import numpy.core._dotblas
    print 'FAST BLAS'
except ImportError:
    print 'slow blas'

In older versions of numpy, this C module would only be compiled during the installation process if an optimized BLAS library was found. However, _dotblas has been removed altogether in development versions > 1.10.0 (as mentioned in this previous SO question), so the script will always report slow blas for these versions.

I've written an updated version of the numpy test script that reports the BLAS linkage correctly for the latest versions; you can find it here.

于 2015-04-14T22:01:31.400 回答