2

我编写了一个 python 脚本来计算一组单词之间的语义相似度。基于此,我想删除与其他词不强相关的词。以下是从集合中删除单词的代码。

line_combined=copy(line1)
threshold = 1/len(line_combined)
for word3 in line_combined:
    print("simdict[" + word3 + "] =" + str(simdict[word3]))
    print ("ratio is: " + str(simdict[word3]/linesumsim))
    if(simdict[word3]/linesumsim)<threshold:
        line_combined.remove(word3)
        print word3 + " is removed"
print "the output is:"
print line_combined

“line1”是正在考虑的单词集,用作列表。“simdict[word]”保存“word”与集合中其余单词的相似度总和。“linesumsim”是集合中所有单词的所有“simdict”值的总和。

输出是:

linesumsim is 2.82012427883
simdict[city] =0.517357507497
ratio is: 0.183452024217
simdict[mountain] =0.642265108364
ratio is: 0.227743547752
simdict[sky] =0.484908130427
ratio is: 0.171945660007
simdict[sun] =0.637289239227
ratio is: 0.225979132909
simdict[characteristics] =0.538304293319
ratio is: 0.190879635114
the output is:
['city', 'mountain', 'sky', 'sun', 'characteristics']

显然有些词的 simdict 值小于该阈值,在这种情况下为 0.2。但他们没有被删除

4

1 回答 1

1

在迭代要从中删除的同一列表时无法删除。

改变

for word3 in line_combined:

for word3 in line1:
于 2013-08-02T09:15:09.210 回答