0

我正在尝试使用 word2vec 谷歌新闻语料库找到两个长度不等的句子之间的余弦相似度,但出现错误:AxisError: axis 1 is out of bounds for array of dimension 1

下面是我的代码:

from gensim.models import KeyedVectors
EMBEDDING_FILE = '/root/input/GoogleNews-vectors-negative300.bin.gz' # from above
word2vec = KeyedVectors.load_word2vec_format(EMBEDDING_FILE, binary=True)

vocab = word2vec.vocab.keys()
wordsInVocab = len(vocab)

import numpy as np

def sent_vectorizer(sent, model):
    sent_vec = np.zeros(50)
    numw = 0
    for w in sent:
        try:
            vc=model[w]
            vc=vc[0:50]

            sent_vec = np.add(sent_vec, vc) 
            numw+=1
        except:
            pass
    return sent_vec / np.sqrt(sent_vec.dot(sent_vec))

a = sent_vectorizer('Football is played in Brazil',word2vec)
b =sent_vectorizer('Cricket is played in India',word2vec)

word2vec.cosine_similarities(b,a)

我将句子转换为向量,因为 cosine_similarity 将向量数组作为输入。我该如何解决这个问题?

4

1 回答 1

0

word2vec.cosine_similarities将向量作为第一个参数,将 amatrix作为第二个参数。

您正在传递vector第二个参数。再添加一个轴或用于np.stack堆叠ab在一起。

于 2020-02-17T07:20:52.267 回答