2

我试图并行化(以某种简单的方式)我最初使用 Shogun 机器学习工具箱的机器学习代码。训练有许多可能的配置,因此顺序处理不是一种合适的方法。我有一个名为的学习机对象mkl_object,其参数将根据我编写的路径生成器生成的网格参数path列表 ( ) 进行更新,该路径生成器称为. 我想要一个多处理设置,以便为每条路径学习一个模型。也就是说,例如,对应于三个路径列表的三个模型:,将分别在分离的核心中学习三个模型。请参阅下面的代码及其错误输出:pathsgridObj.generateRandomGridPaths()mkl_objectpaths = [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]

from multiprocessing import Pool
#from functools import partial # I already tried with partial and parmap
#import parmap as par
# My Machine learning and random grid search modules:
from mklObj import *
from gridObj import *
# The input training and test data subsets are ShogunFeature objects
[feats_train,
feats_test,
labelsTr,
labelsTs] = load_multiclassToy('../shogun-data/toy/',# Directory
         'train_multiclass.dat',# Sample dataSet file name
         'label_multiclass.dat')# Multi-class Labels file name

mkl_object = mklObj() # Learning machine global instantiation
#Function for mapping:    
def mkPool(path): # path: a list of learning parameters
    global feats_train # Train and test data produced above
    global labelsTr
    global feats_test
    global labelsTs

    global mkl_object 

    if path[0][0] is 'gaussian':
        a = 2*path[0][1][0]**2
        b = 2*path[0][1][1]**2
    else:
        a = path[0][1][0]
        b = path[0][1][1]
    # Setting each listelement (paths[i]) as learning parameter:
    mkl_object.mklC = path[5]
    mkl_object.weightRegNorm = path[4]
    mkl_object.fit_kernel(featsTr=feats_train,
                   targetsTr=labelsTr,
                   featsTs=feats_test,
                   targetsTs= labelsTs,
                   kernelFamily=path[0][0],
                   randomRange=[a, b],            
                   randomParams=[(a + b)/2, 1.0],  
                   hyper=path[3],       
                   pKers=path[2])
    # Returns the test error:
    return mkl_object.testerr

if __name__ == '__main__':

    p = Pool(3)
#### Loading the experimentation grid of parameters.
    grid = gridObj(file = 'gridParameterDic.txt')
    paths = grid.generateRandomGridPaths(trials = 3)
    print 'See the path list: ', paths
    [a, b, c] = paths
    # I already made tests with passing 'paths' and '[paths]' and the error is the same.
    print p.map(mkPool, [a, b, c])

请参阅下面的输出错误:

/usr/bin/python2.7 /home/.../mklCall.py
See the path list: [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]
Traceback (most recent call last):
The entered hyperparameter distribution is not allowed: weibull
  File "../mklCall.py", line 76, in <module>
The entered hyperparameter distribution is not allowed: linear
    print p.map(mkPool, [a, b, c])
The entered hyperparameter distribution is not allowed: triangular
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
TypeError: 'NoneType' object is not iterable

Process finished with exit code 1

上面的服装异常不应该发生,因为weibull(和其他出现的)是一个有效的字符串(输入参数)。因此,执行时似乎存在不明原因的疾病。此错误重复为len(paths)

如果我为单个路径运行训练,而不使用Pool.map(),则不会出现错误。

我还为某些路径以线性形式运行了代码,并且没有错误:

acc = []   
for path in paths:
    print 'A path: ', path
    acc.append(mkPool(path))
    print 'Accuracy: ', acc[-1]

我遵循了 python 文档https://docs.python.org/2/library/multiprocessing.html。建议、示例或可能的解决方案将不胜感激。

先感谢您。

4

0 回答 0