python - 在长位串的 numpy 数组中计算汉明距离的更快方法

Question

在使用具有接近 200 万个位串的 numpy 数组计算汉明距离时，我试图实现更快的计算时间，每个位串的长度为 1280。

我当前的实现大约需要 4 秒，这对我的情况来说非常糟糕。我目前的做法：

>>> A = np.array(['00000000000000000000000000000000000000000000000001101110000000000000000000000000000000000000000100011000100001000000000000000100011110101111011110000000000000000000000000000000000000001000000000000000000000000000001001100011000110000000000000000000000000000100000000000000000000000010000000000000000000100000000000000000000000000000100011000010000000000000010001100001000000000000001000011000110000000010000000001000010000100000000000000100001000010000000000000010000110001100000001100001000000000100001000010000100001000010000100000000000000000000000000000000001000010000100001000110000100011000010000100001000000000000000000000000000000000000000100011000010000100001000010000100001000000000000000000000000000000000000000000001000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'])

>>> A = A.repeat(2000000,axis=0)
>>> B = A

>>> A.shape
(2000000,)
>>> B.shape
(2000000,)

>>> first=(np.fromstring(A, dtype=np.uint8)-48).reshape(-1,1280)
>>> second=(np.fromstring(B, dtype=np.uint8)-48).reshape(-1,1280)

>>> hamm_dist = (first!=second).sum(1)

>>> hamm_dist.shape
(2000000,)

有没有更快的方法可以将我的计算时间缩短到不到一秒或更好？

python - 在长位串的 numpy 数组中计算汉明距离的更快方法

0 回答 0

Related

Reference