python - 如何创建具有 gensim 相似性的矩阵

Question

下午或晚上。

提前为这件事打扰社区道歉。关于问题：目标是确定情感词（大五项）和从一个人的阅读行为获得的文本语料库中的词的相似性。我设法编写了一个可以列出列表的代码（见帖子末尾），但目标实际上是一个矩阵。或者换句话说，到目前为止它看起来像这样

演示词 \t 情感词 \t 相似度分数

我的目标是这样的，每个分数标签都是分开的

来自语料库的单词	情感词 1	情感词 2	n 情感词
一字	分数	分数	分数
词二	分数	分数	分数
单词 n	分数	分数	分数

感谢大家的阅读，如果这是一个简单的解决方案，我再次抱歉，我只是无法找到它。

from gensim.models import Word2Vec
import os

#Paths to the necessary files 
Model_Pfad = r'D:\OneDrive\Phyton\modelC.model'     #word2vec model
ausgabe= r'D:\OneDrive\Phyton\numbers.txt'          #file with the results
emo_file = r'D:\OneDrive\Phyton\test.txt'           #List of words of which the similarity is determined  
out_file= 'D:\OneDrive\Phyton\Ergebnisse.txt'       

model = Word2Vec.load(Model_Pfad)


x = list(model.wv.index_to_key[:500]) # creates a list with the 500 most common words in the w2v

corpus_words = "\n".join(x)


#print(corpus_words, file = open (ausgabe,'a')) #just to ckeck the 500 words if necessary

corpus_wordsB = r'D:\OneDrive\Phyton\numbers.txt'


file = open(emo_file,'r') #load the target words
list_emo = []
for line in file:
    list_emo.append(line.lower())
file.close()

file = open(corpus_wordsB,'r') #load the words from the corpus
list_corpus = []
for line in file:
    list_corpus.append(line.lower())
file.close()


file = open(out_file,'w')
for x in range(0, len(list_emo)):
    w1 = list_emo[x].strip('\r\n') #get a word from the emo list

    for y in range(0, len(list_corpus)):
        w2 = list_corpus[y].strip('\r\n') #get a word from the corpus list
        try:
            distance = round(model.wv.similarity(w1,w2),5) # get the similarity between emotional word and word from corpus
        except KeyError:
            #print 'not in vocabulary'
            distance = 'N/A'
        file.write(w1+'\t'+w2+'\t')
        file.write(str(distance))
        file.write('\n')
file.close()

print ('done')

python - 如何创建具有 gensim 相似性的矩阵

0 回答 0

Related

Reference