我正在尝试像这样获得颜色关联:
apple -> red
banana -> yellow
grass -> green
sky -> blue
使用 GoogleNews-vectors-negative300.bin 向量,我第一次尝试
wv.similarity('apple',color)
其中颜色是原色,例如“红色”、“黄色”、“蓝色”等。
与水果“橙色”始终是最高的颜色联想,可能是因为它将颜色和水果混为一谈。当我删除橙色时,结果仍然很奇怪:
apple:
[('violet', 0.24978276994901127), ('green', 0.20656763297902447), ('red', 0.19834849929308024), ('yellow', 0.18963902211016806), ('cyan', 0.17945308073294569), ('blue', 0.13687176308102386)]
cherry:
[('violet', 0.27348741504236473), ('red', 0.25540695681746473), ('yellow', 0.24285150471329794), ('blue', 0.20400566489159569), ('green', 0.18741563150077917), ('cyan', 0.12736182067644364)]
banana:
[('yellow', 0.27708333668133234), ('green', 0.25977272141145935), ('red', 0.24736077659820707), ('violet', 0.23909913025940599), ('cyan', 0.16519069493338848), ('blue', 0.15660144725154587)]
所以显然“紫罗兰色”与“苹果”和“樱桃”沿其他维度对齐(也许它们是植物?)。
我试着把它作为一个类比。这适用于某些对象,但不能很好地概括:
wv.most_similar(restrict_vocab=100000, positive=['apple','yellow'], negative=['banana'])
[(u'red', 0.5296207666397095), (u'orange', 0.501822829246521), (u'bright_yellow', 0.49562686681747437), (u'purple', 0.4909234642982483), (u'blue', 0.465557336807251), (u'pink', 0.43768370151519775), (u'colored', 0.4296746551990509), (u'brown', 0.4290006756782532), (u'bright_orange', 0.4261433482170105), (u'yellows', 0.4199957549571991)]
wv.most_similar(restrict_vocab=100000, positive=['grass','yellow'], negative=['banana'])
[(u'bright_yellow', 0.4722655713558197), (u'blue', 0.45448029041290283), (u'red', 0.43442922830581665), (u'lawns', 0.4275570809841156), (u'maroon', 0.4197036325931549), (u'bright_orange', 0.41167205572128296), (u'brown', 0.4110153317451477), (u'purple', 0.4074830412864685), (u'grassy', 0.4017237722873688), (u'striped', 0.40009182691574097)]
我还尝试了 facebook fasttext 嵌入,但结果更糟。我应该如何解决这个问题并隔离“对象的常见颜色”的向量?