python - 返回一个模糊匹配列表的索引

Question

我有一个“ID”列表：

ids = [None, '20160928a', '20160929a', ... ]

我发现的另一个“id”列表是使用fuzzywuzzy的重复id：

repeat_offenders = ['20160928a', '20161115a', '20161121a', ... ]

我想再次使用fuzzywuzzy 创建一个列表列表，其中包含重复ID 在列表“ID”中的位置（按索引）。所以输出看起来像这样（因为它们是重复的，所以列表中的每个列表都至少包含两个元素：

collected_ids = [[0,5,700], [6,3], [4,826,12]]

我的尝试，目前只返回 id 而不是 id 的位置：

collected_urls = []
for offender in repeat_offenders[:10]:
     best_match = process.extract(offender, ids)
     collection = []
     for match in best_match:
         if match[1] > 95:
            collection.append(match[0])
         else:
            pass
     collected_urls.append(collection)

更新，我尝试使用 Moe 的答案来查找/分组完全匹配：

idz = ids
collected_ids = []
for i in range(len(idz)):
    tmp = [i]
    for j in range(len(ids)):
        if idz[i] == idz[j] and i != j:
            tmp.append(j)
            del j 
    if len(tmp) > 1:
        collected_ids.append(tmp)
    del i

score 1 · Accepted Answer

如果 usingfuzzywuzzy不是必须的，您可以使用两个for-loops来检查重复项并生成list如下：

collected_ids = []
for i in xrange(len(ids)):
    tmp = [i]
    for j in xrange(len(ids)):
        if ids[i] == ids[j] and i != j:
            tmp.append(j)
    if len(tmp) > 1:
        collected_ids.append(tmp)
collected_ids = list(set(collected_ids))

编辑：

如果要避免重复，可以创建一个列表来检查索引是否已添加，如下所示：

collected_ids = []
ids = ['a', 'b', 'a', 'c', 'd', 'a', 't', 't', 'k', 'c']
check = [] 
for i in range(len(ids)):
    tmp = [i]
    check.append(i)  
    for j in range(len(ids)):
        if ids[i] == ids[j] and i != j and j not in check:
            tmp.append(j)
            check.append(j)
    if len(tmp) > 1:
        collected_ids.append(tmp)
print(collected_ids)

输出：

[[0, 2, 5], [3, 9], [6, 7]]

python - 返回一个模糊匹配列表的索引

1 回答 1

Related

Reference