我想知道是否可以在(Python)运行时更改 OpenBLAS 在 numpy 后面使用的最大线程数?
我知道可以在通过环境变量运行解释器之前设置它OMP_NUM_THREADS
,但我想在运行时更改它。
通常,当使用 MKL 而不是 OpenBLAS 时,可以:
import mkl
mkl.set_num_threads(n)
您可以通过使用 调用openblas_set_num_threads
函数来做到这一点ctypes
。我经常发现自己想要这样做,所以我写了一个小上下文管理器:
import contextlib
import ctypes
from ctypes.util import find_library
# Prioritize hand-compiled OpenBLAS library over version in /usr/lib/
# from Ubuntu repos
try_paths = ['/opt/OpenBLAS/lib/libopenblas.so',
'/lib/libopenblas.so',
'/usr/lib/libopenblas.so.0',
find_library('openblas')]
openblas_lib = None
for libpath in try_paths:
try:
openblas_lib = ctypes.cdll.LoadLibrary(libpath)
break
except OSError:
continue
if openblas_lib is None:
raise EnvironmentError('Could not locate an OpenBLAS shared library', 2)
def set_num_threads(n):
"""Set the current number of threads used by the OpenBLAS server."""
openblas_lib.openblas_set_num_threads(int(n))
# At the time of writing these symbols were very new:
# https://github.com/xianyi/OpenBLAS/commit/65a847c
try:
openblas_lib.openblas_get_num_threads()
def get_num_threads():
"""Get the current number of threads used by the OpenBLAS server."""
return openblas_lib.openblas_get_num_threads()
except AttributeError:
def get_num_threads():
"""Dummy function (symbol not present in %s), returns -1."""
return -1
pass
try:
openblas_lib.openblas_get_num_procs()
def get_num_procs():
"""Get the total number of physical processors"""
return openblas_lib.openblas_get_num_procs()
except AttributeError:
def get_num_procs():
"""Dummy function (symbol not present), returns -1."""
return -1
pass
@contextlib.contextmanager
def num_threads(n):
"""Temporarily changes the number of OpenBLAS threads.
Example usage:
print("Before: {}".format(get_num_threads()))
with num_threads(n):
print("In thread context: {}".format(get_num_threads()))
print("After: {}".format(get_num_threads()))
"""
old_n = get_num_threads()
set_num_threads(n)
try:
yield
finally:
set_num_threads(old_n)
你可以像这样使用它:
with num_threads(8):
np.dot(x, y)
如评论中所述,openblas_get_num_threads
在openblas_get_num_procs
撰写本文时是非常新的功能,因此除非您从最新版本的源代码编译 OpenBLAS,否则可能无法使用。
我们最近开发threadpoolctl
了一个跨平台包,用于控制调用 python 中的 C 级线程池时使用的线程数。它的工作原理与@ali_m 的答案类似,但通过循环遍历所有加载的库来自动检测需要限制的库。它还带有自省 API。
这个包可以使用pip install threadpoolctl
并附带一个上下文管理器来安装,它允许您控制包使用的线程数,例如numpy
:
from threadpoolctl import threadpool_limits
import numpy as np
with threadpool_limits(limits=1, user_api='blas'):
# In this block, calls to blas implementation (like openblas or MKL)
# will be limited to use only one thread. They can thus be used jointly
# with thread-parallelism.
a = np.random.randn(1000, 1000)
a_squared = a @ a
您还可以更好地控制不同的线程池(例如区分blas
调用openmp
)。
注意:此软件包仍在开发中,欢迎任何反馈。