所以我做了一个 AnnoyIndexer 并运行了一些 most_similar 查询以在 300 维向量空间中找到一些向量的最近邻居。这是它的代码:
def most_similar(self, vector, num_neighbors):
"""Find the approximate `num_neighbors` most similar items.
Parameters
----------
vector : numpy.array
Vector for word/document.
num_neighbors : int
Number of most similar items
Returns
-------
list of (str, float)
List of most similar items in format [(`item`, `cosine_distance`), ... ]
"""
ids, distances = self.index.get_nns_by_vector(
vector, num_neighbors, include_distances=True)
return [(self.labels[ids[i]], 1 - distances[i] / 2) for i in range(len(ids))]
我想知道为什么返回的距离值都取自 1 然后除以 2?当然,在这样做之后,最大/最小距离都搞砸了吗?