The actual problem I wish to solve is, given a set of N unit vectors and another set of M vectors calculate for each of the unit vectors the average of the absolute value of the dot product of it with every one of the M vectors. Essentially this is calculating the outer product of the two matrices and summing and averaging with an absolute value stuck in-between.
For N and M not too large this is not hard and there are many ways to proceed (see below). The problem is when N and M are large the temporaries created are huge and provide a practical limitation for the provided approach. Can this calculation be done without creating temporaries? The main difficulty I have is due to the presence of the absolute value. Are there general techniques for "threading" such calculations?
As an example consider the following code
N = 7
M = 5
# Create the unit vectors, just so we have some examples,
# this is not meant to be elegant
phi = np.random.rand(N)*2*np.pi
ctheta = np.random.rand(N)*2 - 1
stheta = np.sqrt(1-ctheta**2)
nhat = np.array([stheta*np.cos(phi), stheta*np.sin(phi), ctheta]).T
# Create the other vectors
m = np.random.rand(M,3)
# Calculate the quantity we desire, here using broadcasting.
S = np.average(np.abs(np.sum(nhat*m[:,np.newaxis,:], axis=-1)), axis=0)
This is great, S is now an array of length N and contains the desired results. Unfortunately in the process we have created some potentially huge arrays. The result of
np.sum(nhat*m[:,np.newaxis,:], axis=-1)
is a M X N array. The final result, of course, is only of size N. Start increasing the sizes of N and M and we quickly run into a memory error.
As noted above, if the absolute value were not required then we could proceed as follows, now using einsum()
T = np.einsum('ik,jk,j', nhat, m, np.ones(M)) / M
This works and works quickly even for quite large N and M . For the specific problem I need to include the abs()
but a more general solution (perhaps a more general ufunc) would also be of interest.