python - Python：比较几千个字符串。有什么快速的比较方法吗？

Question

我有一组大约6 000 个数据包，出于比较目的，我将其表示为 字符串（前 28 个字节），以与同样多的数据包进行比较，我也将其表示为 28 个字节的字符串。

我必须将一组中的每个数据包与其他所有数据包匹配。匹配总是唯一的。

我发现比较字符串需要一些时间。有什么方法可以加快这个过程吗？

EDIT1：我不想置换字符串元素，因为我总是确保数据包列表和相应字符串列表之间的顺序被保留。

EDIT2：这是我的实现：

list1, list2 # list of packets (no duplicates present in each list!)
listOfStrings1, listOfStrings2 # corresponding list of strings. Ordering is preserved.
alreadyMatchedlist2Indices = []
for list1Index in xrange(len(listOfStrings1)):
            stringToMatch = listOfStrings1[list1Index]
            matchinglist2Indices = [i for i, list2Str in enumerate(listOfStrings2)
                                if list2Str == stringToMatch and i not in alreadyMatchedlist2Indices]
            if not matchinglist2Indices:
                tmpUnmatched.append(list1Index)
            elif len(matchinglist2Indices) == 1:
                tmpMatched.append([list1Index, matchinglist2Indices[0]])
                alreadyMatchedlist2Indices.append(matchinglist2Indices[0])
            else:
                list2Index = matchinglist2Indices[0] #taking first matching element anyway
                tmpMatched.append([list1Index, list2Index])
                alreadyMatchedlist2Indices.append(list2Index)

score 5 · Accepted Answer

---在这里，我假设您正在逐个提取每个字符串并与其他所有字符串进行比较。---

我建议对您的字符串列表进行排序并比较相邻的字符串。这应该具有 O(nlogn) 的运行时间。

score 4 · Accepted Answer

这是一个简单的线性时间方法——至少如果我正确理解你的问题：

>>> def get_matches(a, b):
...     reverse_map = {x:i for i, x in enumerate(b)}
...     return [(i, reverse_map[x]) for i, x in enumerate(a) if x in reverse_map]
... 
>>> get_matches(['a', 'b', 'c'], ['c', 'd', 'e'])
[(2, 0)]

这接受两个字符串序列，aand b，并返回一个匹配列表，表示为aand的索引元组b。a这是 O(n + m)，其中 m 和 n 是和的长度b。

score 0 · Accepted Answer

0

有什么问题：

matches = [packet for packet in list1 if packet in list2]

于 2013-04-17T20:53:28.340 回答

python - Python：比较几千个字符串。有什么快速的比较方法吗？

3 回答 3

Related

Reference