2

I am attempting to use Python's multiprocessing library to experiment with distributed neural networks. At the moment, I have it set up so that a server process creates the neural network and chunks the input for mini-batch gradient descent with the batches being put into a shared queue, processed by a client process, and the result put into a separate shared queue.

So far, everything is working except that in order to process the batches and produce a gradient, the child processes need a copy of the network weights, which I have shared using a multiprocessing Array. The client processes only need a read-only copy of the weights, but the server process updates the local copies after each training epoch.

My question is how would I update the shared memory to reflect the changed weights so that on the next epoch, the client processes have the correct values for computing gradients.

4

1 回答 1

1

自从阅读本文以来,我一直在玩,multiprocessing发现在 an 中更新数据mp.Array并不太难——让我感到惊讶的是,当使用循环迭代Array. 下面的代码片段设置了一个简单的 master-worker 集,使用mp.Process(使用Pool会更好,但这对我来说更快),其中 anmp.Array用于同步 master 经常更改的数据(尽可能快)

from multiprocessing import Process, RLock, Array
from time import sleep

def worker(n, array, arrayLock):
    while True:
        arrayLock.acquire()
        print("Worker: %i -> %s" % (n, ",".join(str(i) for i in array)))
        arrayLock.release()
        sleep(n + 1)

if __name__ == '__main__':
    arrayLock = RLock()
    array = Array('i', range(10), lock=arrayLock)

    pd = {}
    for i in range(3):
        pd[i] = Process(target=worker, args=(i, array, arrayLock))
        pd[i].start()

    try:
        while True:
            arrayLock.acquire()
            for i in range(len(array)):
                array[i] = -array[i]
            arrayLock.release()
    except KeyboardInterrupt:
        pass

    for p in pd.values():
        p.terminate()

导致以下输出

~> python mp_shared.py
Worker: 0 -> 0,1,2,3,4,5,6,7,8,9
Worker: 1 -> 0,-1,-2,-3,-4,-5,-6,-7,-8,-9
Worker: 2 -> 0,1,2,3,4,5,6,7,8,9
Worker: 0 -> 0,-1,-2,-3,-4,-5,-6,-7,-8,-9
Worker: 1 -> 0,-1,-2,-3,-4,-5,-6,-7,-8,-9
Worker: 0 -> 0,1,2,3,4,5,6,7,8,9

跨进程更新数据只需更改Array. 我遇到了一个问题,结果看起来像这样(注意数据的交替符号)

Worker: 0 -> 0,-1,2,-3,4,-5,6,-7,8,-9
Worker: 1 -> 0,-1,2,-3,4,-5,6,-7,8,-9
Worker: 2 -> 0,-1,2,-3,4,-5,6,-7,8,-9

这是因为当我读取或写入数组时,Lock自动创建的Array不会同步整个循环的访问!主进程将在Array工作人员获取锁之间进行更改。

为避免这种情况,我刚刚创建了我自己的RLock(需要RLock像触摸Array一样获取它,如果你已经获取了它会阻塞 a Lock)用于Array. 我将它传递RLock给所有工作人员,以便他们每个人都可以进行原子操作(在您的情况下,我确信读写是原子的以防止梯度计算中的错误很重要)。

编辑:

另一种选择似乎是,mmap但我不能评论它的使用,如果改变在这里按需要工作。

于 2013-02-03T04:54:00.953 回答