1

假设我有以下源字符串:

Humpty dumpty <span id="1">sat</span> on a wall, humpty dumpty had a great fall. All of <span id="two">the kings</span> horses and all the kings men.

和列表中的其他几个字符串,每个字符串由一个新行分隔:

Humpty dumpty sat on a wall, humpty dumpty had a great fall. All of the kings horses and all the kings men.

Humpty dumpty sat on the wall, all of the kings horses and all the kings men.

There is a humpty dumpty who had sat on the wall, and all of the kings horses and all the kings men.

Humpty dumpty sat on some wall, humpty dumpty had a great fall. All of the kings horses and all the kings men couldn't put him together again.

Humpty dumpty this is a completely related sentence.

我希望能够从目标字符串开始,找出哪些“列表中的其他字符串”与使用 python 的源字符串最匹配。在源字符串和目标字符串对之间的比较中是否有一些最佳方法可以得出一些“分数”,并且基于某些标准能够确定哪个字符串与源字符串最匹配?(在这种情况下,最相似的字符串应该是第一个字符串,因为它是没有任何“ <span id="1"></span>”的源字符串。

4

2 回答 2

1

您可以使用 PyLevenshtein 模块查找 Levenshtein 距离并使用它来确定字符串之间的相似性。

https://code.google.com/p/pylevenshtein/

于 2013-09-10T05:05:45.603 回答
1

您可能可以使用difflib 之类的东西。它适用于 Python 2 和 3。

于 2013-09-10T05:06:48.163 回答