我从我的语料库中训练了 word2vec 模型。
corpus = "fewdata.txt"
f = io.open(corpus, mode ="r", encoding = "utf-8")
#corpus1 = list(f)
lines = f.readlines()
sentences =[]
for line in lines:
mqul= line.split()
#print(mqul)
sentences.append(mqul)
model = Word2Vec(sentences = sentences, size = 100, sg = 1, window = 3, min_count = 1, iter = 10, workers = Pool()._processes)
model.init_sims(replace = True)
model.save('model.bin')
model = Word2Vec.load('model.bin')
print(model)
然后
model['aImIroawi']
array([-0.06561889, -0.15222837, 0.00912119, -0.11638119, -0.03242991,
-0.13457145, -0.09813376, 0.07011288, 0.0711898 , 0.10069774,
-0.01028561, 0.11995316, 0.03737569, -0.01811702, -0.12935248],
dtype=float32)
但我想将此模型用于具有 5333 词汇的 txt 文件,并将其保存到 txt 文件中的形式
{ 'Aimurawi : array([-0.04728228, 0.13645388, 0.13822217, 0.13086553, -0.0963688 ],dtype= float32),
Tiona : array([-0.04728228, 0.13645388, 0.13822217, 0.13086553, -0.0963688 ], dype =float32)}
对于我的文本文件中的所有词汇,有人可以帮我怎么做吗?