python - Python找到最接近的匹配句子

Question

我正在尝试从专辑中获取曲目（歌曲）列表，对于给定的曲目，我想获得所有匹配的曲目。我已经提到了下面的示例，关于如何在 python 中进行此操作的任何想法？似乎 difflib.get_close_matches 只适用于单个单词而不是一个句子。

示例：（查找包含字符串“环游世界”的任何内容

tracks = ['Around The World (La La La La La) (Radio Version)', 'Around The World (La La La La La) (Alternative Radio Version)', 'Around The World (La La La La La) (Acoustic Mix)', 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)', 'World In Motion','My Heart Beats Like A Drum (Dam Dam Dam)','Thinking Of You','Why Oh Why','Mistake No. 2','With You','Love Is Blind','Lonesome Suite','Let Me Come & Let Me Go']

输出：

 Around The World (La La La La La) (Radio Version)
 Around The World (La La La La La) (Alternative Radio Version)
 Around The World (La La La La La) (Acoustic Mix)
 Around The World (La La La La La) (Rüegsegger#Wittwer Club Mix)

score 8 · Accepted Answer

difflib.get_close_matches可以使用字符串（除了单个单词）。在这种情况下，您需要降低截止值（默认值为 0.6），并提高n最大匹配数：

In [19]: import difflib

In [20]: tracks = ['Around The World (La La La La La) (Radio Version)', 'Around The World (La La La La La) (Alternative Radio Version)', 'Around The World (La La La La La) (Acoustic Mix)', 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)', 'World In Motion','My Heart Beats Like A Drum (Dam Dam Dam)','Thinking Of You','Why Oh Why','Mistake No. 2','With You','Love Is Blind','Lonesome Suite','Let Me Come & Let Me Go']

In [21]: difflib.get_close_matches('Around the world', tracks, n = 4,cutoff = 0.3)
Out[21]: 
['Around The World (La La La La La) (Acoustic Mix)',
 'Around The World (La La La La La) (Radio Version)',
 'Around The World (La La La La La) (Alternative Radio Version)',
 'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)']

score 2 · Accepted Answer

filter(lambda x: 'Around The World' in x, tracks)

这将为您提供名称中包含的歌曲列表'Around The World'。如果您使用的是 Python 3，请将其转换为列表 ( list(filter(...)))，因为它返回一个filter对象。

如果有错别字，那我帮不了你。

score 1 · Accepted Answer

为此，您可以利用SequenceMatcher的get_matching_blocks

>>> from pprint import PrettyPrinter
>>> from difflib import SequenceMatcher
>>> pp = PrettyPrinter(indent = 4)
>>> pp.pprint(tracks)
[   'World In Motion',
    'With You',
    'Why Oh Why',
    'Thinking Of You',
    'My Heart Beats Like A Drum (Dam Dam Dam)',
    'Mistake No. 2',
    'Love Is Blind',
    'Lonesome Suite',
    'Let Me Come & Let Me Go',
    'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)',
    'Around The World (La La La La La) (Radio Version)',
    'Around The World (La La La La La) (Alternative Radio Version)',
    'Around The World (La La La La La) (Acoustic Mix)']
>>> seq = ((e, SequenceMatcher(None, 'Around the world', e).get_matching_blocks()[0]) for e in tracks)
>>> seq = [k for k, _ in sorted(seq, key = lambda e:e[-1].size, reverse = True)]
>>> pp.pprint(seq)
[   'Around The World (La La La La La) (Rucegsegger#Wittwer Club Mix)',
    'Around The World (La La La La La) (Radio Version)',
    'Around The World (La La La La La) (Alternative Radio Version)',
    'Around The World (La La La La La) (Acoustic Mix)',
    'World In Motion',
    'With You',
    'Thinking Of You',
    'Why Oh Why',
    'My Heart Beats Like A Drum (Dam Dam Dam)',
    'Mistake No. 2',
    'Love Is Blind',
    'Lonesome Suite',
    'Let Me Come & Let Me Go']
>>>

score -2 · Accepted Answer

你可以这样做。

temp = "Around The World (La La La La La)"

for string in fh.readlines():
    if temp in string:
       print temp

如果它与您正在阅读的任何文件中的温度相匹配，它将打印出来。

或者您可以使用正则表达式进行匹配。

python - Python找到最接近的匹配句子

4 回答 4

Related

Reference