我正在使用 WMD 来计算句子之间的相似度。例如:
distance = model.wmdistance(sentence_obama, sentence_president)
参考:https ://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html
但是,也有基于 WMD 的相似性方法(WmdSimilarity).
参考: https ://markroxor.github.io/gensim/static/notebooks/WMD_tutorial.html
除了明显的是距离和相似度之外,两者之间有什么区别?
更新:两者完全相同,只是表示方式不同。
n_queries = len(query)
result = []
for qidx in range(n_queries):
# Compute similarity for each query.
qresult = [self.w2v_model.wmdistance(document, query[qidx]) for document in self.corpus]
qresult = numpy.array(qresult)
qresult = 1./(1.+qresult) # Similarity is the negative of the distance.
# Append single query result to list of all results.
result.append(qresult)
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/similarities/docsim.py