python - 使用 Python 在列表中查找相似元素

Question

我需要使用 python 在列表中查找类似的项目。（例如“限制”类似于“限制”或“下载 ICD 文件”类似于“下载 ICD zip 文件”）我真的希望我的结果与字符相似，而不是数字（例如“角度 1”类似于'角度 2'）。我列表中的所有这些字符串都以 '\0' 结尾

我正在尝试做的是将每个项目拆分为空白，并查看是否有任何部分由数字组成。但不知何故，它并没有像我想要的那样工作。

这是我的代码示例：

for k in range(len(split)):  # split already consists of splitted list entry
    replace = split[k].replace(
        "\\0", ""
    )  # replace \0 at every line ending to guarantee it is only a digit
    is_num = lambda q: q.replace(
        ".", "", 1
    ).isdigit()  # lambda i found somewhere on the internet
    check = is_num(replace)
    if check == True:  # break if it is a digit and split next entry of list
        break
    elif check == False:  # i know, else would be fine too
        seq = difflib.SequenceMatcher(a=List[i].lower(), b=List[j].lower())
        if seq.ratio() > 0.9:
            print(Element1, "is similar to", Element2, "\t")
            break

score 0 · Accepted Answer

试试这个，它使用get_close_matches来自 difflib 而不是sequencematcher.

from difflib import get_close_matches
a = ["abc/0", "efg/0", "bc/0"]
b=[]
for i in a:
    x = i.rstrip("/0")
    b.append(x)

for i in range(len(b)):
        print(get_close_matches(b[i], (b)))

python - 使用 Python 在列表中查找相似元素

1 回答 1

Related

Reference