我正在使用具有余弦相似度的 tf-idf 来计算描述(句子)相似度
输入字符串:
3/4x1/2x3/4 blk mi tee
以下是我需要在其中找到类似于输入字符串的句子的句子
smith-cooper® 33rt1 reducing pipe tee 3/4 x 1/2 x 3/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 1 x 1/2 x 3/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 1-1/4 x 1 x 3/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 1-1/2 x 3/4 x 1-1/2 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 1-1/2 x 1-1/4 x 1 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 2 x 2 x 3/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 2 x 1-1/2 x 1-1/4 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 2-1/2 x 2 x 2 in npt 150 lb malleable iron black
smith-cooper® 33rt1 reducing pipe tee 3 x 3 x 2 in npt 150 lb malleable iron black
由于句子几乎相似,我使用 tf-idf 方法,它对出现在所有文档( Idf )中的单词给予低分,而对唯一单词给予更高的分数,这有助于更容易地找到相似的文档。
有没有比这更好的方法?