1

在下面的代码中,我_py通过使用generate_object方法在属性内动态创建类的对象。

如果我不使用并发方法,则代码可以完美运行。但是,如果我使用来自 的并发concurrent.futures,我不会得到想要的结果,因为错误说(除了其他事情):

_pickle.PicklingError: Can't pickle <class '__main__.Script_0_1'>: attribute lookup Script_0_1 on __main__ failed

在谷歌搜索这个错误之后,我了解到只有可腌制对象才能作为参数传递ProcessPoolExecutor.map(),所以我决定看看如何将我的动态类变成可腌制的。

问题是这个问题的所有其他解决方案都以不同的方式创建了一个动态对象(与我在 中使用的不同_string_to_object())。示例:12

我非常希望保持现在的动态对象创建方式,因为我的很多真实代码都是基于它的,因此我正在寻找一个与下面的玩具代码一起使用的并发解决方案。

代码

import random
import codecs
import re
from concurrent.futures import ProcessPoolExecutor
import multiprocessing

class A:
    def __init__(self):
        self._py = r'''
class Script_{0}_{1}:
\tdef print_numbers(self):
\t\tprint('Numbers = ', {0}, 'and', {1})
'''
    
    def generate_text(self, name_1, name_2):
        py = self._py.format(name_1, name_2)
        py = codecs.decode(py, 'unicode_escape')
        return py

    def generate_object(self, number_1, number_2):
        """ Generate an object of the class inside the string self._py """

        return self._string_to_object(self.generate_text(number_1, number_2))

    def _string_to_object(self, str_class, *args, **kwargs):
        """ Transform a program written inside str_class to an object. """

        exec(str_class)
        class_name = re.search("class (.*):", str_class).group(1).partition("(")[0]
        return locals()[class_name](*args, **kwargs)

from functools import partial

print('Single usage')
a = A()
script = a.generate_object(1, 2)
script.print_numbers()

print('Multiprocessing usage')
n_cores = 3
n_calls = 3

def concurrent_function(args):
    first_A = args[0]
    second_A = args[1]
    first_A.print_numbers()
    second_A.print_numbers()

with ProcessPoolExecutor(max_workers=n_cores) as executor:
    args = ( (A().generate_object(i, i+1), A().generate_object(i+1, i+2)) for i in range(n_calls))
    results = executor.map(concurrent_function, args)
4

2 回答 2

1

我想不出一种方法Script来在全局名称空间中创建严格遵守您当前方案的类。然而:

由于每次调用方法时,generate_object您都在本地命名空间中创建一个新类并实例化该类的一个对象,为什么不推迟这项工作,让它在进程池中完成呢?这还具有并行执行此类创建处理的额外优势,并且不需要酸洗。我们现在传递给concurrent_function两个整数参数number_1number_2

import random
import codecs
import re
from concurrent.futures import ProcessPoolExecutor


class A:
    def __init__(self):
        self._py = r'''
class Script_{0}_{1}:
\tdef print_numbers(self):
\t\tprint('Numbers = ', {0}, 'and', {1})
'''

    def generate_text(self, name_1, name_2):
        py = self._py.format(name_1, name_2)
        py = codecs.decode(py, 'unicode_escape')
        return py

    def generate_object(self, number_1, number_2):
        """ Generate an object of the class inside the string self._py """

        return self._string_to_object(self.generate_text(number_1, number_2))

    def _string_to_object(self, str_class, *args, **kwargs):
        """ Transform a program written inside str_class to an object. """

        exec(str_class)
        class_name = re.search("class (.*):", str_class).group(1).partition("(")[0]
        return locals()[class_name](*args, **kwargs)

"""
from functools import partial

print('Single usage')
a = A()
script = a.generate_object(1, 2)
script.print_numbers()
"""


def concurrent_function(args):
    for arg in args:
        obj = A().generate_object(arg[0], arg[1])
        obj.print_numbers()

def main():
    print('Multiprocessing usage')
    n_cores = 3
    n_calls = 3

    with ProcessPoolExecutor(max_workers=n_cores) as executor:
        args = ( ((i, i+1), (i+1, i+2)) for i in range(n_calls))
        # wait for completion of all tasks:
        results = list(executor.map(concurrent_function, args))

if __name__ == '__main__':
    main()

印刷:

Multiprocessing usage
Numbers =  0 and 1
Numbers =  1 and 2
Numbers =  1 and 2
Numbers =  2 and 3
Numbers =  2 and 3
Numbers =  3 and 4

一种更有效的方式

没有必要使用exec. 而是使用闭包:

from concurrent.futures import ProcessPoolExecutor

def make_print_function(number_1, number_2):
    def print_numbers():
        print(f'Numbers = {number_1} and {number_2}')

    return print_numbers



def concurrent_function(args):
    for arg in args:
        fn = make_print_function(arg[0], arg[1])
        fn()


def main():
    print('Multiprocessing usage')
    n_cores = 3
    n_calls = 3

    with ProcessPoolExecutor(max_workers=n_cores) as executor:
        args = ( ((i, i+1), (i+1, i+2)) for i in range(n_calls))
        # wait for completion of all tasks:
        results = list(executor.map(concurrent_function, args))

if __name__ == '__main__':
    main()

印刷:

Multiprocessing usage
Numbers = 0 and 1
Numbers = 1 and 2
Numbers = 1 and 2
Numbers = 2 and 3
Numbers = 2 and 3
Numbers = 3 and 4

使用对象缓存避免不必要地创建新对象

obj_cache = {} # each process will have its own

def concurrent_function(args):
    for arg in args:
        # was an object created with this set of arguments: (arg[0], arg[1])?
        obj = obj_cache.get(arg)
        if obj is None: # must create new object
            obj = A().generate_object(arg[0], arg[1])
            obj_cache[arg] = obj # save object for possible future use
        obj.print_numbers()
于 2020-11-07T14:47:43.190 回答
0

可能我找到了一种不需要该exec()功能的方法。实现(带注释)如下。

import codecs
from concurrent.futures import ProcessPoolExecutor

class A:
    def __init__(self):
        self.py = r'''
class Script_{0}_{1}:
\tdef print_numbers(self):
\t\tprint('Numbers = ', {0}, 'and', {1})
'''
    def generate_text(self, number_1, number_2):
        py = self.py.format(number_1, number_2)
        py = codecs.decode(py, 'unicode_escape')
        return py

    def generate_object(self, number_1, number_2):
        class_code = self.generate_text(number_1, number_2)
        # Create file in disk
        with open("Script_" + str(number_1) + "_" + str(number_2) + ".py", "w") as file:
            file.write(class_code)
        # Now import it and the class will now be (correctly) seen in __main__
        package = "Script_" + str(number_1) + "_" + str(number_2)
        class_name = "Script_" + str(number_1) + "_" + str(number_2)
        # This is the programmatically version of 
        # from <package> import <class_name>
        class_name = getattr(__import__(package, fromlist=[class_name]), class_name)
        return class_name()

def concurrent_function(args):
    first_A = args[0]
    second_A = args[1]
    first_A.print_numbers()
    second_A.print_numbers()

def main():
    print('Multiprocessing usage')
    n_cores = 3
    n_calls = 2
    
    with ProcessPoolExecutor(max_workers=n_cores) as executor:
        args = ( (A().generate_object(i, i+1), A().generate_object(i+2, i+3)) for i in range(n_calls))
        results = executor.map(concurrent_function, args)

if __name__ == '__main__':
    main()

基本上我正在做的不是动态分配类,而是将其写入文件。我这样做是因为我遇到的问题的根源是 pickle 在查看全局范围时无法正确定位嵌套类。现在我正在以编程方式导入该类(将其保存到文件之后)。

当然,这种解决方案也有处理文件的瓶颈,成本也很高。我没有衡量是处理文件还是exec更快,但在我的实际情况下,我只需要合成类的一个对象(而不是像提供的玩具代码中那样每个并行调用一个对象),因此文件选项最适合为了我。

还有一个问题:在使用n_calls = 15(例如)并执行多次之后,有时它似乎无法导入模块(刚刚创建的文件)。我试图在以sleep()编程方式导入它之前放一个,但它没有帮助。使用少量调用时似乎不会发生此问题,并且似乎也是随机发生的。部分错误堆栈的示例如下所示:

Traceback (most recent call last):
  File "main.py", line 45, in <module>
    main()
  File "main.py", line 42, in main
    results = executor.map(concurrent_function, args)
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 674, in map
    results = super().map(partial(_process_chunk, fn),
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 600, in map
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 600, in <listcomp>
    fs = [self.submit(fn, *args) for args in zip(*iterables)]
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 184, in _get_chunks
    chunk = tuple(itertools.islice(it, chunksize))
  File "main.py", line 41, in <genexpr>
    args = ( (A().generate_object(i, i+1), A().generate_object(i+2, i+3)) for i in range(n_calls))
  File "main.py", line 26, in generate_object
    class_name = getattr(__import__(package, fromlist=[class_name]), class_name)
ModuleNotFoundError: No module named 'Script_13_14'
于 2020-11-09T20:13:33.690 回答