我有:
a = "This is Product A with property B and propery C. Buy it now!"
b = "This is Product B with property X and propery Y. Buy it now!"
c = "This is Product C having no properties. Buy it now!"
我正在寻找一种可以做到的算法:
> magic(a, b, c)
=> ['A with property B and propery C',
'B with property X and propery Y',
'C having no properties']
我必须在 1000 多个文本中查找重复项。超级性能不是必须的,但会很好。
- 更新
我正在寻找单词的顺序。因此,如果:
d = 'This is Product D with text engraving: "Buy". Buy it now!'
第一个“购买”不应重复。我猜我必须使用n个单词的阈值才能被视为重复。