0

I am new to the IPython parallel package but really want to get it going. What I have is a 4D numpy array which I want to run through slices,rows,columns and process the 4th dimension (time). The processing is a minimization routine that takes a bit of time which is why I would like to parallelize it.

from IPython.parallel import Client
from numpy import *
from matplotlib.pylab import *

c = Client()

v = c.load_balanced_view()
v.block=False

def process( src, freq, d ):
        # Get slice, row, col
        sl,r,c = src

        # Get data
        mm = d[:,sl,c,r]

        # Call fitting routine
        <fiting routine that requires freq, mm and outputs multiple parameters> 

        return <output parameters??>


##  Create the mask of what we are going to process
mask = zeros(d[0].shape)
mask[sl][ nonzero( d[0,sl] > 10*median(d[0]) ) ] = 1

# find all non-zero points in the mask
points = array(nonzero( mask == 1)).transpose()

# Call async
asyncresult = v.map_async( process, points, freq=freq, d=d )

My function "process" requires two parameters: 1) freq is a numpy array (100,1) and 2) d which is (100, 50, 110, 110) or so. I want to retrieve several parameters from the fitting.

All the examples I have seen that use map_async have simple lambda functions etc and the outputs seem to be trivial.

What I want is to apply "process" to every point in d where the mask is not zero and to have maps of the output parameters in the same space. [Added: I am getting "process() takes exactly 3 arguments (1 given) ].

(Step 2 of this might be required as I am passing a huge numpy array "d" to each process. But once I figure out the data passing I should hopefully be able to figure out a more efficient way of doing this.)

Thanks for any help.

4

1 回答 1

2

我通过做解决了数据传递问题

def mapper(x):
    return apply(x[0], x[1:])

并使用元组列表调用 map_async,其中第一个元素是我的函数,其余元素是我的函数的参数。

asyncResult = pool.map_async(mapper, [(func, arg1, arg2) for arg1, arg2 in myArgs])

我先尝试了一个 lambda,但显然它不能被腌制,所以这是不行的。

于 2012-09-26T01:35:45.983 回答