-1

我想知道如何比较两个哈希值而不是汉明距离。

有办法吗?

最终目标是确定相似图像可以共有的python字典的键。

例如。

import imagehash

# img1, img2, img3 are same images
img1_hash = imagehash.average_hash(Image.open('data/image1.jpg'))
img2_hash = imagehash.average_hash(Image.open('data/image2.jpg'))
img3_hash = imagehash.average_hash(Image.open('data/image3.jpg'))
img4_hash = imagehash.average_hash(Image.open('data/image4.jpg'))
print(img1_has, img2_hash, img3_hash, img4_hash)
>>> 81c38181bf8781ff, 81838181bf8781ff, 81838181bf8781ff, ff0000ff3f00e7ff

我要打印的结果。

{common value1 : [81c38181bf8781ff, 81838181bf8781ff, 81838181bf8781ff], common value2: [ff0000ff3f00e7ff]}

我试图将图像转换为哈希值并进行比较,

但请让我知道是否有任何其他方法无需转换为哈希值。

4

1 回答 1

0

您可以使用任何距离指标,例如rapidfuzz,并将其放入聚类算法中。

确保pip install rapidfuzz并且;

from rapidfuzz import process, fuzz
import numpy as np
from sklearn.cluster import dbscan

hashes = ["81c38181bf8781ff", "81838181bf8781ff", "81838181bf8781ff", "ff0000ff3f00e7ff"]

X = np.arange(len(hashes)).reshape(-1, 1)

def rapidfuzz_dist(x, y):
    i, j = int(x[0]), int(y[0])
    return 1 - ( fuzz.ratio(hashes[i], hashes[j]) / 100 )

labels, clusters = dbscan(X, metric=rapidfuzz_dist, eps=.5, min_samples=1)

将创建集群,您可以输出一些您的问题

for cluster in set(clusters):
    print( f"cluster: {cluster}:")
    print( [ h for h,c in zip(hashes,clusters) if c == cluster] )

要得到

cluster: 0:
['81c38181bf8781ff', '81838181bf8781ff', '81838181bf8781ff']
cluster: 1:
['ff0000ff3f00e7ff']
于 2021-11-27T20:25:31.787 回答