76

I have a method inside a class that needs to do a lot of work in a loop, and I would like to spread the work over all of my cores.

I wrote the following code, which works if I use normal map(), but with pool.map() returns an error.

import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)

class OtherClass:
  def run(sentence, graph):
    return False

class SomeClass:
  def __init__(self):
    self.sentences = [["Some string"]]
    self.graphs = ["string"]

  def some_method(self):
      other = OtherClass()

      def single(params):
          sentences, graph = params
          return [other.run(sentence, graph) for sentence in sentences]

      return list(pool.map(single, zip(self.sentences, self.graphs)))


SomeClass().some_method()

Error 1:

AttributeError: Can't pickle local object 'SomeClass.some_method..single'

Why can't it pickle single()? I even tried to move single() to the global module scope (not inside the class - makes it independent of the context):

import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)

class OtherClass:
  def run(sentence, graph):
    return False


def single(params):
    other = OtherClass()
    sentences, graph = params
    return [other.run(sentence, graph) for sentence in sentences]

class SomeClass:
  def __init__(self):
    self.sentences = [["Some string"]]
    self.graphs = ["string"]

  def some_method(self):
      return list(pool.map(single, zip(self.sentences, self.graphs)))


SomeClass().some_method()

and I get the following ...

Error 2:

AttributeError: Can't get attribute 'single' on module 'main' from '.../test.py'

4

2 回答 2

88

错误一:

AttributeError:无法腌制本地对象“SomeClass.some_method..single”

您通过将嵌套的目标函数single()移到顶层自己解决了这个错误。

背景:

池需要腌制(序列化)它发送到其工作进程(IPC)的所有内容。Pickling 实际上只保存函数的名称,而 unpickling 需要按名称重新导入函数。为此,需要在顶层定义函数,嵌套函数将不能被子级导入,并且已经尝试腌制它们会引发异常(更多)。


错误2:

AttributeError:无法从 '.../test.py' 获取模块 'main' 上的属性 'single'

您在定义函数和类之前启动池,这样子进程就不能继承任何代码。将您的游泳池启动到底部并保护(为什么?)它if __name__ == '__main__':

import multiprocessing

class OtherClass:
  def run(self, sentence, graph):
    return False


def single(params):
    other = OtherClass()
    sentences, graph = params
    return [other.run(sentence, graph) for sentence in sentences]

class SomeClass:
   def __init__(self):
       self.sentences = [["Some string"]]
       self.graphs = ["string"]

   def some_method(self):
      return list(pool.map(single, zip(self.sentences, self.graphs)))

if __name__ == '__main__':  # <- prevent RuntimeError for 'spawn'
    # and 'forkserver' start_methods
    with multiprocessing.Pool(multiprocessing.cpu_count() - 1) as pool:
        print(SomeClass().some_method())

附录

...我想把工作分散到我所有的核心上。

multiprocessing.Pool关于分块如何工作的潜在有用背景:

Python 多处理:理解块大小背后的逻辑

于 2018-09-11T20:45:33.487 回答
19

我偶然发现了一个非常讨厌的解决方案。只要您使用def语句,它就可以工作。如果您声明该函数,您想在解决它的函数开头使用关键字Pool.mapglobal但我不会在严肃的应用程序中依赖它

import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)

class OtherClass:
  def run(sentence, graph):
    return False

class SomeClass:
  def __init__(self):
    self.sentences = [["Some string"]]
    self.graphs = ["string"]

  def some_method(self):
      global single  # This is ugly, but does the trick XD

      other = OtherClass()

      def single(params):
          sentences, graph = params
          return [other.run(sentence, graph) for sentence in sentences]

      return list(pool.map(single, zip(self.sentences, self.graphs)))


SomeClass().some_method()
于 2020-05-18T22:00:55.703 回答