python - 优化搜索以在列表中查找下一个匹配值

Question

我有一个程序遍历一个列表，并为每个对象找到下一个具有匹配值的实例。当它打印出每个对象的位置时。该程序运行得非常好，但我遇到的问题是，当我使用大量数据（列表中约 6,000,000 个对象）运行它时，它会花费很长时间。如果有人可以提供有关我如何使流程更高效的见解，我将不胜感激。

def search(list):
    original = list
    matchedvalues = []
    count = 0
    for x in original:
        targetValue = x.getValue()
        count = count + 1
        copy = original[count:]
        for y in copy:
             if (targetValue == y.getValue):
                 print (str(x.getLocation) + (,) + str(y.getLocation))
                 break

score 2 · Accepted Answer

也许您可以制作一个字典，其中包含与每个项目对应的索引列表，如下所示：

values = [1,2,3,1,2,3,4]

from collections import defaultdict

def get_matches(x):
    my_dict = defaultdict(list)
    for ind, ele in enumerate(x):
        my_dict[ele].append(ind)
    return my_dict

结果：

>>> get_matches(values)
defaultdict(<type 'list'>, {1: [0, 3], 2: [1, 4], 3: [2, 5], 4: [6]})

编辑：

我添加了这部分，以防它有帮助：

values = [1,1,1,1,2,2,3,4,5,3]

def get_next_item_ind(x, ind):
    my_dict = get_matches(x)
    indexes = my_dict[x[ind]]
    temp_ind = indexes.index(ind)
    if len(indexes) > temp_ind + 1:
        return(indexes)[temp_ind + 1]
    return None

结果：

>>> get_next_item_ind(values, 0)
1
>>> get_next_item_ind(values, 1)
2
>>> get_next_item_ind(values, 2)
3
>>> get_next_item_ind(values, 3)
>>> get_next_item_ind(values, 4)
5
>>> get_next_item_ind(values, 5)
>>> get_next_item_ind(values, 6)
9
>>> get_next_item_ind(values, 7)
>>> get_next_item_ind(values, 8)

score 1 · Accepted Answer

有几种方法可以通过最小化额外的内存使用来提高搜索的效率（特别是当您的数据很大时）。

可以直接对传入的列表进行操作，不需要复制，这样就不需要: original = list, 或者copy = original[count:]
您可以使用原始列表的切片进行测试，并enumerate(p)遍历这些切片。您不需要额外的变量count，并且enumerate(p)在 Python 中很高效

重新实现，这将变为：

def search(p):
    # iterate over p
    for i, value in enumerate(p):

        # if value occurs more than once, print locations
        # do not re-test values that have already been tested (if value not in p[:i])
        if value not in p[:i] and value in p[(i + 1):]:
            print(e, ':', i, p[(i + 1):].index(e))

v = [1,2,3,1,2,3,4]

search(v)

1 : 0 2
2 : 1 2
3 : 2 2

以这种方式实现它只会打印出重复值的值/位置（我认为这是您在原始实现中的意图）。

其他注意事项：

超过 2 次出现的值： 如果值在列表中重复多次，那么您可能希望实现一个函数以递归方式遍历列表。事实上，这个问题并没有解决这个问题 - 在你的情况下它可能不需要。
使用字典：我完全同意上面的 Akavall，字典是在 Python 中查找值的好方法 - 特别是如果您需要稍后在程序中再次查找值。如果您在最初创建列表时构建字典而不是列表，这将最有效。但是，如果您只这样做一次，那么构建字典和查询字典将花费您更多的时间，而不是像上面描述的那样简单地遍历列表。

希望这可以帮助！

python - 优化搜索以在列表中查找下一个匹配值

2 回答 2

Related

Reference