这是MWE
我正在使用的更大的代码。它对位于某个阈值以下的所有值在 KDE(核密度估计)上执行蒙特卡洛积分(在这个问题上建议了积分方法: Integrate 2D kernel density estimation),对列表中的多个点迭代并返回由这些结果组成的列表。
import numpy as np
from scipy import stats
from multiprocessing import Pool
import threading
# Define KDE integration function.
def kde_integration(m_list):
# Put some of the values from the m_list into two new lists.
m1, m2 = [], []
for item in m_list:
# x data.
m1.append(item[0])
# y data.
m2.append(item[1])
# Define limits.
xmin, xmax = min(m1), max(m1)
ymin, ymax = min(m2), max(m2)
# Perform a kernel density estimate on the data:
x, y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values)
# This list will be returned at the end of this function.
out_list = []
# Iterate through all points in the list and calculate for each the integral
# of the KDE for the domain of points located below the value of that point
# in the KDE.
for point in m_list:
# Compute the point below which to integrate.
iso = kernel((point[0], point[1]))
# Sample KDE distribution
sample = kernel.resample(size=1000)
#Choose number of cores and split input array.
cores = 4
torun = np.array_split(sample, cores, axis=1)
# Print number of active threads.
print threading.active_count()
#Calculate
pool = Pool(processes=cores)
results = pool.map(kernel, torun)
#Reintegrate and calculate results
insample_mp = np.concatenate(results) < iso
# Integrate for all values below iso.
integral = insample_mp.sum() / float(insample_mp.shape[0])
# Append integral value for this point to list that will return.
out_list.append(integral)
return out_list
# Generate some random two-dimensional data:
def measure(n):
"Measurement model, return two coupled measurements."
m1 = np.random.normal(size=n)
m2 = np.random.normal(scale=0.5, size=n)
return m1+m2, m1-m2
# Create list to pass to KDE integral function.
m_list = []
for i in range(100):
m1, m2 = measure(5)
m_list.append(m1.tolist())
m_list.append(m2.tolist())
# Call KDE integration function.
print 'Integral result: ', kde_integration(m_list)
multiprocessing
在这个问题上建议了代码中的加速内核估计采样以加速代码(它可以达到〜3.4x)。
代码可以正常工作,直到我尝试将超过 ~62-63 个元素的列表传递给 KDE 函数(即:我在该行中设置了一个超过 63 的值for i in range(100)
)如果我这样做,我会收到以下错误:
Traceback (most recent call last):
File "~/gauss_kde_temp.py", line 78, in <module>
print 'Integral result: ', kde_integration(m_list)
File "~/gauss_kde_temp.py", line 48, in kde_integration
pool = Pool(processes=cores)
File "/usr/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 144, in __init__
self._worker_handler.start()
File "/usr/lib/python2.7/threading.py", line 494, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
通常(10 次中有 9 次)围绕活动线程374
。我在这里的编码方面超出了我的水平python
,我不知道如何解决这个问题。任何帮助都感激不尽。
添加
我尝试添加一个while
循环以防止代码使用太多线程。我所做的是用print threading.active_count()
这段代码替换该行:
# Print number of active threads.
exit_loop = True
while exit_loop:
if threading.active_count() < 300:
exit_loop = False
else:
# Pause for 10 seconds.
time.sleep(10.)
print 'waiting: ', threading.active_count()
代码在到达302
活动线程时停止(即:卡在循环内)。我等了 10 多分钟,代码从未退出循环,活动线程的数量从未从302
. 一段时间后活动线程的数量不应该减少吗?