我正在尝试在 Python 中使用自然语言处理库中的word2vec
模块。gensim
文档说要初始化模型:
from gensim.models import word2vec
model = Word2Vec(sentences, size=100, window=5, min_count=5, workers=4)
gensim
输入句子的格式是什么?我有原始文本
"the quick brown fox jumps over the lazy dogs"
"Then a cop quizzed Mick Jagger's ex-wives briefly."
etc.
我需要发布哪些额外的处理word2fec
?
更新:这是我尝试过的。当它加载句子时,我什么也得不到。
>>> sentences = ['the quick brown fox jumps over the lazy dogs',
"Then a cop quizzed Mick Jagger's ex-wives briefly."]
>>> x = word2vec.Word2Vec()
>>> x.build_vocab([s.encode('utf-8').split( ) for s in sentences])
>>> x.vocab
{}