python - 在python中将字典传递给具有可修改元素的进程

Question

我正在尝试使用多处理库的 Process 模块对我的代码进行线程化以获得更好的性能。

代码的骨架是为他们工作的每个线程创建字典，在这一切完成之后，字典被汇总并保存到一个文件中。资源的创建方式如下：

histos = {}
for int i in range(number_of_threads):
    histos[i] = {}
    histos[i]['all'] =      ROOT.TH1F objects
    histos[i]['kinds_of'] = ROOT.TH1F objects
    histos[i]['keys'] =     ROOT.TH1F objects

然后在进程中，每个线程使用它自己的 histos[thread_number] 对象，在包含的 ROOT.TH1Fs 上工作。但是，我的问题是，如果我像这样使用 Process 启动线程：

proc = {}
for i in range(Nthreads):
    it0 = 0 + i * n_entries / Nthreads  # just dividing up the workload
    it1 = 0 + (i+1) * n_entries / Nthreads 
    proc[i] = Process(target=RecoAndRecoFix, args=(i, it0, it1, ch,histos)) 
    # args: i is the thread id (index), it0 and it1 are indices for the workload,
    # ch is a variable that is read-only, and histos is what we defined before, 
    # and the contained TH1Fs are what the threads put their output into.
    # The RecoAndFix function works inside with histos[i], thus only accessing
    # the ROOT.TH1F objects that are unique to it. Each thread works with its own histos[i] object.
    proc[i].start()

那么线程确实可以访问它们的 histos[i] 对象，但不能写入它们。准确地说，当我在 TH1F 直方图上调用 Fill() 时，没有填充数据，因为它不能写入对象，因为它们不是共享变量。

所以在这里：https ://docs.python.org/3/library/multiprocessing.html 我发现我应该改用 multiprocessing.Array() 创建一个可以由线程读取和写入的数组，像这样：

typecoder = {}
histos = Array(typecoder,number_of_threads)
for int i in range(number_of_threads):
    histos[i] = {}
    histos[i]['all'] =      ROOT.TH1F objects
    histos[i]['kinds_of'] = ROOT.TH1F objects
    histos[i]['keys'] =     ROOT.TH1F objects

但是，它不接受字典作为类型。它不起作用，它说 TypeError: unhashable type: 'dict'

那么解决这个问题的最佳方法是什么？我需要将存储在字典中的每个“各种键”的实例传递给每个线程，以便它们自己工作。他们必须能够编写这些接收到的资源。

感谢您的帮助，如果我忽略了一些琐碎的事情，我很抱歉，我之前做过线程代码，但还没有使用 python。

score 1 · Accepted Answer

缺少的部分是“进程”和“线程”之间的区别；您将它们混合在您的帖子中，但您的方法仅适用于线程，不适用于进程。

线程都共享内存；他们都将引用同一个字典，因此可以使用它来相互交流以及与父母交流。

进程有单独的内存；每个人都会得到自己的字典副本。如果他们想交流，他们必须通过其他方式交流（例如，使用multiprocessing.Queue）。另一方面，这意味着他们获得了分离的安全性。

Python 中的另一个复杂因素是“GIL”；线程将主要串行共享相同的 Python 解释器，仅在执行 I/O、访问网络或使用一些为其提供特殊功能的库（numpy、图像处理等）时并行运行。同时，进程获得了完全的并行性。

score 0 · Accepted Answer

Python 多处理模块有一个管理器类，它提供可以跨线程和进程共享的字典。

有关示例，请参阅文档：https ://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes

python - 在python中将字典传递给具有可修改元素的进程

2 回答 2

Related

Reference