python - 在 Python 中优化 for 循环以更快地工作

Question

我正在努力优化 Python 代码。目标是获取一个整数列表并计算并输出列表中有多少对。一对被认为是 2 个数字，差异为K(2在这种情况下) 例如：

k = 2
list = [1, 5, 3, 4, 2]

这里的对将是(1,3), (5,3), (2,4)
答案是：3

我想提高代码效率，当前版本需要 8 秒或更长时间。

cProfile告诉我这for number in sorted_array:是唯一需要花费所有时间的线路。但我似乎无法弄清楚如何优化for循环。

有没有人有任何经验或建议？太感谢了。

编码：

#generate random numbers
import bisect
import random
n_integers = random.sample(xrange(1, 29999), 29998)
####cProfile
import cProfile
pr = cProfile.Profile()
pr.enable()

#the difference between numbers we are looking for
k = 2
sorted_array = []
pairs_counter = 0

#insert N integers in array in sorted fashion and typecast
for number in n_integers:
    bisect.insort_left(sorted_array, number)

#iterate over the array and calculate (number + K)
for number in sorted_array:
    the_pair = number + k
    #check if the number+K is in the array
    if the_pair in sorted_array:
        pairs_counter += 1

print pairs_counter

#Close cProfile
pr.disable()
pr.print_stats(sort = 'time')

c简介：

30075 function calls in 7.995 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    7.834    7.834    7.834    7.834 <ipython-input-5-19d578e3c582>:19(<module>)
    29998    0.143    0.000    0.143    0.000 {_bisect.insort_left}
        1    0.016    0.016    0.159    0.159 <ipython-input-5-19d578e3c582>:15(<module>)

score 2 · Accepted Answer

如果[1,3,3,3,3,3,6]结果为五对( k=2)，您可以使用numpy'功能来消除 Python循环。broadcasting for

import numpy as np
import random

a = random.sample(xrange(1, 29999), 29998)
a = np.array(a)
# or just a = np.random.randint(1, 29999, 29998)

k = 2

创建一个新数组，其中包含所有可以配对的整数

b = a + k

b通过广播创建一个布尔数组a：这会产生一个二维数组，其中True' 到处都有一对。

c = a[:, np.newaxis] == b

总结所有True的

np.sum(c)

要不就：

np.sum(a[:, np.newaxis] == b)

如果如示例输入所示，列表仅包含唯一值，则numpy解决方案将是：

a = random.sample(xrange(1, 29999), 29998)
k = 2
a = np.array(a)
b = a + k
result = np.sum(np.in1d(b, a, assume_unique=True))

这要快得多。

事实上，如果这些值不是唯一的，numpy.in1d则比上面的广播解决方案要快得多。通过切换参数的顺序，您可以计算5 对[1,3,3,3,3,3,6]。

result = np.sum(np.in1d(a, b))

现在有点吃乌鸦的东西：将列表转换为一组（假设唯一值），纯 Python解决方案比numpy解决方案更快。

q = 10000
a = random.sample(xrange(1, q), q-1)
a = set(a)
result = sum(n+k in a for n in a)

使用sum生成器表达式不需要制作任何中间对象——这可能是其速度/效率的原因之一。

score 0 · Accepted Answer

使用更好的算法。在您的代码中

for number in sorted_array:
    the_pair = number + k
    #check if the number+K is in the array
    if the_pair in sorted_array:
        pairs_counter += 1

您正在检查整个数组the_pair，因此对列表进行排序没有任何收获。由于所有元素都是整数，所以列表排序后`the_pair，如果它出现在列表中只能出现在接下来的两个位置之一。尝试类似的东西

for index, number in sorted_array:
    if number+k in sorted_array[index+1:index+k+1]:
        <do whatever>

score 0 · Accepted Answer

@saulspatz 是正确的，您在代码中使排序无关紧要，但是我建议您跳过排序而不是生成数千个列表切片。如果您与不可变类型（例如：）进行比较，该in操作实际上非常快tuple()。因此，我将提出以下代码：

#generate random numbers
import bisect
import random
n_integers = tuple(random.sample(xrange(1, 29999), 29998))
####cProfile
import cProfile
pr = cProfile.Profile()
pr.enable()

#the difference between numbers we are looking for
k = 2
pairs_counter = 0

#iterate over the array and calculate (number + K)
for number in n_integers:
    the_pair = number + k
    #check if the number+K is in the array
    if the_pair in n_integers:
        pairs_counter += 1

print pairs_counter

#Close cProfile
pr.disable()
pr.print_stats(sort = 'time')

输出：

29996
     1 function calls in 0.000 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.000    0.000 {method 'disable' of'_lsprof.Profiler' objects}

python - 在 Python 中优化 for 循环以更快地工作

3 回答 3

Related

Reference