我正在尝试使用 word2vec 谷歌新闻语料库找到两个长度不等的句子之间的余弦相似度,但出现错误:AxisError: axis 1 is out of bounds for array of dimension 1
下面是我的代码:
from gensim.models import KeyedVectors
EMBEDDING_FILE = '/root/input/GoogleNews-vectors-negative300.bin.gz' # from above
word2vec = KeyedVectors.load_word2vec_format(EMBEDDING_FILE, binary=True)
vocab = word2vec.vocab.keys()
wordsInVocab = len(vocab)
import numpy as np
def sent_vectorizer(sent, model):
sent_vec = np.zeros(50)
numw = 0
for w in sent:
try:
vc=model[w]
vc=vc[0:50]
sent_vec = np.add(sent_vec, vc)
numw+=1
except:
pass
return sent_vec / np.sqrt(sent_vec.dot(sent_vec))
a = sent_vectorizer('Football is played in Brazil',word2vec)
b =sent_vectorizer('Cricket is played in India',word2vec)
word2vec.cosine_similarities(b,a)
我将句子转换为向量,因为 cosine_similarity 将向量数组作为输入。我该如何解决这个问题?