0

I'm struggling to come up with a non-obfuscating, efficient way of using numpy to compute a self correlation function in a set of 3D vectors.

I have a set of vectors in a 3d space, saved in an array

a = array([[ 0.24463039,  0.58350592,  0.77438803],
       [ 0.30475903,  0.73007075,  0.61165238],
       [ 0.17605543,  0.70955876,  0.68229821],
       [ 0.32425896,  0.57572195,  0.7506    ],
       [ 0.24341381,  0.50183697,  0.83000565],
       [ 0.38364726,  0.62338687,  0.68132488]])

their self correlation function is defined as enter image description here

in case the image above doesn't stay available, the formula is also printed below: C(t,{v}n) = \frac 1{n-t}\sum{i=0}^{n-1-t}\vec v_i\cdot\vec v_{i+t}

I'm struggling to code this up in an efficient way, non-obfuscating way. I can compute this expliticly with two nested for loops, but that's slow. There is a fast way of doing it by using one of the embedded functions in numpy, but they seem to be using an entirely different definition of correlation function. A similar problem has been solved here, How can I use numpy.correlate to do autocorrelation? but that doesn't handle vectors. Do you know how can I solve this?

4

4 回答 4

2

NumPy 例程用于一维数组。作为“最小”的改进,对标准化步骤使用矢量化操作(np.arange在最后一行中使用):

def vector_autocorrelate(t_array):
  n_vectors = len(t_array)
  # correlate each component indipendently
  acorr = np.array([np.correlate(t_array[:,i],t_array[:,i],'full') for i in xrange(3)])[:,n_vectors-1:]
  # sum the correlations for each component
  acorr = np.sum(acorr, axis = 0)
  # divide by the number of values actually measured and return
  acorr /= (n_vectors - np.arange(n_vectors))
  return acorr

对于较大的数组大小,您应该考虑使用傅立叶变换算法进行相关。如果您对此感兴趣,请查看库tidynamics的示例(免责声明:我编写了库,它仅依赖于 NumPy)。

作为参考,这里是 NumPy 代码(我为测试而编写)、您的代码 vector_autocorrelate 和 tidynamics 的时间安排。

size = [2**i for i in range(4, 17, 2)]

np_time = []
ti_time = []
va_time = []
for s in size:
    data = np.random.random(size=(s, 3))
    t0 = time.time()
    correlate = np.array([np.correlate(data[:,i], data[:,i], mode='full') for i in range(data.shape[1])])[:,:s]
    correlate = np.sum(correlate, axis=0)/(s-np.arange(s))
    np_time.append(time.time()-t0)
    t0 = time.time()
    correlate = tidynamics.acf(data)
    ti_time.append(time.time()-t0)
    t0 = time.time()
    correlate = vector_autocorrelate(data)
    va_time.append(time.time()-t0)

你可以看到结果:

print("size", size)
print("np_time", np_time)
print("va_time", va_time)
print("ti_time", ti_time)

尺寸 [16, 64, 256, 1024, 4096, 16384, 65536]

NP_TIME [0.00023794174194335938,0.0002703666666666666666666666687011719,0.000271320343017575781,0.001544952392578125,0.0152392578125

va_time [0.00021696090698242188,0.0001690387725830078,0.000339508056640625,0.0014629364013671875,0.0014629364013671875

ti_time [0.0011148452758789062, 0.0008449554443359375, 0.0007512569427490234, 0.0010488033294677734, 0.0026645660400390625, 0.007939338684082031, 0.048232316970825195]

或绘制它们

plt.plot(size, np_time)
plt.plot(size, va_time)
plt.plot(size, ti_time)
plt.loglog()

对于非常小的数据系列,“N**2”算法是不可用的。

于 2018-02-17T21:27:42.400 回答
1

这是我的结果。它速度较慢(大约 4 倍)并提供其他结果。为什么我还是要发布它?我认为值得看看如何衡量以及有什么区别。如果 - 另外 - 任何人找到不同结果的原因,我会更高兴。

所以,这是我的解决方案:

%timeit [np.mean([np.dot(a[t], a[i+t]) for i in range(len(a)-t)]) for t in range(len(a))]

结果:每个循环 95.2 µs ± 3.41 µs(7 次运行的平均值 ± 标准偏差,每次 10000 个循环)

相比之下,您的解决方案大约快 4 倍:

%timeit vector_autocorrelate(a)

交付:每个循环 24.8 µs ± 1.46 µs(平均值 ± 标准偏差。7 次运行,每次 10000 次循环)

于 2018-02-17T22:38:50.073 回答
1

嗨,我遇到了一个类似的问题。这是我的想法

def fast_vector_correlation(M):
    n_row = M.shape[0]
    dot_mat = M.dot(M.T)
    corr = [np.trace(dot_mat,offset=x) for x in range(n_row)]
    corr/=(n_row-np.arange(n_row))
    return corr

这个想法是dot_mat包含行向量之间的所有标量积。要计算不同t值的相关性,您只需对(右上角的三角形部分的)对角线求和,如图所示

于 2020-05-06T18:27:04.720 回答
0

I'm posting the answer here in case someone else will need it, since it took me quite some time to figure out a viable approach. I ended up solving this by defining the following function

def vector_autocorrelate(t_array):
  n_vectors = len(t_array)
  # correlate each component indipendently
  acorr = np.array([np.correlate(t_array[:,i],t_array[:,i],'full') for i in xrange(3)])[:,n_vectors-1:]
  # sum the correlations for each component
  acorr = np.sum(acorr, axis = 0)
  # divide by the number of values actually measured and return
  acorr = np.array( [ val / (n_vectors - i) for i,val in enumerate(acorr)])
  return acorr

If anybody has a better idea, I would really like to hear that since I think the current one is still not as compact as it should be. It's better than nothing though, which is why I'm posting it here.

于 2018-02-17T18:01:59.390 回答