4

如果我的 x 列表和 y 列表是:

x = [10,20,30]
y = [1,2,3,15,22,27]

我希望返回值是一个字典,其中包含小于 x 值的元素计数:

{
    10:3,
    20:1,
    30:2,
}

我有一个非常大的列表,所以我希望有一种更好的方法来做到这一点,它不涉及缓慢的嵌套 for 循环。我查看了 collections.Counter 和 itertools 似乎都没有提供分组方式。有没有内置的可以做到这一点?

4

4 回答 4

8

您可以使用该bisect模块和collections.Counter

>>> import bisect
>>> from collections import Counter
>>> Counter(x[bisect.bisect_left(x, item)] for item in y)
Counter({10: 3, 30: 2, 20: 1})
于 2013-09-13T17:05:04.777 回答
5

如果您愿意使用 numpy,基本上您要求的是直方图:

x = [10,20,30]
y = [1,2,3,15,22,27]

np.histogram(y,bins=[0]+x)
#(array([3, 1, 2]), array([ 0, 10, 20, 30]))

要使它成为一个字典:

b = np.histogram(y,bins=[0]+x)[0]
d = { k:v for k,v in zip(x, b)}

对于短名单,这是不值得的,但如果你的名单很长,它可能是:

In [292]: y = np.random.randint(0, 30, 1000)

In [293]: %%timeit
   .....: b = np.histogram(y, bins=[0]+x)[0]
   .....: d = { k:v for k,v in zip(x, b)}
   .....: 
1000 loops, best of 3: 185 µs per loop

In [294]: y = list(y)

In [295]: timeit Counter(x[bisect.bisect_left(x, item)] for item in y)
100 loops, best of 3: 3.84 ms per loop

In [311]: timeit dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))
100 loops, best of 3: 3.75 ms per loop
于 2013-09-13T17:05:33.840 回答
1

Short answer:

dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))

Long answer

First we need to iterate over the y's to check which member is less than something. If we do it for 10 we get this:

>>> [n_y for n_y in y if n_y < 10]
[1, 2, 3]

Then we need to make that '10' a variable looking throw the x's:

>>> [[n_y for n_y in y if n_y < n_x] for n_x in x]
[[1, 2, 3], [1, 2, 3, 15], [1, 2, 3, 15, 22, 27]]

Finally, we need to add this results with the original x's. Here is when zip comes in handy:

>>> zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x])
[(10, [1, 2, 3]), (20, [1, 2, 3, 15]), (30, [1, 2, 3, 15, 22, 27])]

This gives as a list of tuples, so we should cast dict on it to get the final result:

>>> dict(zip(x, [[n_y for n_y in y if n_y < n_x] for n_x in x]))
{10: [1, 2, 3], 20: [1, 2, 3, 15], 30: [1, 2, 3, 15, 22, 27]}
于 2013-09-13T17:16:53.630 回答
0

如果值之间的步x长始终是10,我会这样做:

>>> y = [1,2,3,15,22,27]
>>> step = 10
>>> from collections import Counter
>>> Counter(n - n%step + step for n in y)
Counter({10: 3, 30: 2, 20: 1})
于 2013-09-13T17:50:21.537 回答