python - 在gensim word2vec模型中获取文本给定词嵌入模型的概率

Question

我正在尝试使用 gensim word2vec 模型获得最可能的单词序列。我找到了一个提供这些文件的预训练模型：

word2vec.bin
word2vec.bin.syn0.npy
word2vec.bin.syn1neg.npy

这是我试图用这个模型得到句子概率的代码：

model = model.wv.load(word_embedding_model_path)
model.hs = 1
model.negative = 0
print model.score(sentence.split(" "))

运行此代码时，我收到此错误：

AttributeError: 'Word2Vec' object has no attribute 'syn1'

谁能帮我弄清楚如何解决这个问题。一般来说，我想使用一些预训练模型来获得单词序列一起出现的概率。

score 0 · Accepted Answer

You can't toggle a model from using negative-sampling (eg negative=5, hs=0) to using hierarchical-softmax (eg hs=1, negative=0) after initial setup and training. The two models use different internal properties, that are only created by setup & training. (For example, the property syn1 only exists in a model that was created & trained in hierarchical-softmax mode.)

Since the score() method is currently only functional for HS models, you'd need to only use it with models that were trained in that mode.

(Note also that a value from score() of a single text, against a single model, isn't interpretable as an absolute probability. It's only in comparison against the scores of other texts against the same model, or the same text against alternate models, that the relative value of the score becomes meaningful.)

python - 在gensim word2vec模型中获取文本给定词嵌入模型的概率

1 回答 1

Related

Reference