3

这是MWE我正在使用的更大的代码。它对位于某个阈值以下的所有值在 KDE(核密度估计)上执行蒙特卡洛积分(在这个问题上建议了积分方法: Integrate 2D kernel density estimation),对列表中的多个点迭代并返回由这些结果组成的列表。

import numpy as np
from scipy import stats
from multiprocessing import Pool
import threading

# Define KDE integration function.
def kde_integration(m_list):

    # Put some of the values from the m_list into two new lists.
    m1, m2 = [], []
    for item in m_list:
        # x data.
        m1.append(item[0])
        # y data.
        m2.append(item[1])

    # Define limits.
    xmin, xmax = min(m1), max(m1)
    ymin, ymax = min(m2), max(m2)

    # Perform a kernel density estimate on the data:
    x, y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
    values = np.vstack([m1, m2])
    kernel = stats.gaussian_kde(values)

    # This list will be returned at the end of this function.
    out_list = []

    # Iterate through all points in the list and calculate for each the integral
    # of the KDE for the domain of points located below the value of that point
    # in the KDE.
    for point in m_list:

        # Compute the point below which to integrate.
        iso = kernel((point[0], point[1]))

        # Sample KDE distribution
        sample = kernel.resample(size=1000)

        #Choose number of cores and split input array.
        cores = 4
        torun = np.array_split(sample, cores, axis=1)

        # Print number of active threads.
        print threading.active_count()

        #Calculate
        pool = Pool(processes=cores)
        results = pool.map(kernel, torun)

        #Reintegrate and calculate results
        insample_mp = np.concatenate(results) < iso

        # Integrate for all values below iso.
        integral = insample_mp.sum() / float(insample_mp.shape[0])

        # Append integral value for this point to list that will return.
        out_list.append(integral)

    return out_list


# Generate some random two-dimensional data:
def measure(n):
    "Measurement model, return two coupled measurements."
    m1 = np.random.normal(size=n)
    m2 = np.random.normal(scale=0.5, size=n)
    return m1+m2, m1-m2

# Create list to pass to KDE integral function.
m_list = []
for i in range(100):
    m1, m2 = measure(5)
    m_list.append(m1.tolist())
    m_list.append(m2.tolist())

# Call KDE integration function.
print 'Integral result: ', kde_integration(m_list)

multiprocessing在这个问题上建议了代码中的加速内核估计采样以加速代码(它可以达到〜3.4x)。

代码可以正常工作,直到我尝试将超过 ~62-63 个元素的列表传递给 KDE 函数(即:我在该行中设置了一个超过 63 的值for i in range(100))如果我这样做,我会收到以下错误:

Traceback (most recent call last):
  File "~/gauss_kde_temp.py", line 78, in <module>
    print 'Integral result: ', kde_integration(m_list)
  File "~/gauss_kde_temp.py", line 48, in kde_integration
    pool = Pool(processes=cores)
  File "/usr/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 144, in __init__
    self._worker_handler.start()
  File "/usr/lib/python2.7/threading.py", line 494, in start
    _start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread

通常(10 次中有 9 次)围绕活动线程374。我在这里的编码方面超出了我的水平python,我不知道如何解决这个问题。任何帮助都感激不尽。


添加

我尝试添加一个while循环以防止代码使用太多线程。我所做的是用print threading.active_count()这段代码替换该行:

    # Print number of active threads.
    exit_loop = True
    while exit_loop:
        if threading.active_count() < 300:
            exit_loop = False
        else:
            # Pause for 10 seconds.
            time.sleep(10.)
            print 'waiting: ', threading.active_count()

代码在到达302活动线程时停止(即:卡在循环内)。我等了 10 多分钟,代码从未退出循环,活动线程的数量从未从302. 一段时间后活动线程的数量不应该减少吗?

4

0 回答 0