0

In a dictionary of sorted lists such as d=={1:[1,6,16],2:[1],7:[6]} , how would you delete all the numbers in lists (and hence also the key value pair where the list ends up empty) less than a given value k efficiently? In my case, d will be large.

For example, if k = 15 then we should end up with d == {1:[16]}.

I initialized the dictionary in the first place using d = defaultdict(list).

I tried to use bisect to speed it up but I must have made a mistake.

Is it possible to use the fact the lists are sorted to make it fast?

4

5 回答 5

3
>>> d = {1:[1,6,16],2:[1],7:[6]}
>>> for lst in d.values(): lst[:] = [x for x in lst if x >= 16]
... 
>>> d
{1: [16], 2: [], 7: []}
>>> for k in list(d):
...     if not d[k]:
...         del d[k]
... 
>>> d
{1: [16]}

>>> d = {1:[1,6,16],2:[1],7:[6]}
>>> tmp = [(k, [x for x in lst if x >= 16]) for k, lst in d.items()]
>>> d = {k: v for k, v in tmp if v}
>>> d
{1: [16]}

使用bisect.bisect_left

>>> d = {1:[1,6,16],2:[1],7:[6]}
>>> for k in list(d):
...     d[k] = d[k][bisect.bisect_left(d[k], 16):]
...     if not d[k]:
...         del d[k]
... 
>>> d
{1: [16]}
于 2013-07-26T07:07:29.310 回答
2

你可以做:

from collections import defaultdict
from bisect import bisect_left
d = {1:[1,6,16],2:[1],7:[6]}
d1 = defaultdict(list)
k = 15
for key, value in d.iteritems():
    temp = value[bisect_left(value, 16):]
    if temp:
        d1[key] = temp

print d1.items()

印刷:

[(1, [16])]
于 2013-07-26T07:02:23.693 回答
1

我和我的这个答案有同样的感觉:我读过的所有答案在我看来似乎都是在创建一个新对象。
我更喜欢对列表进行就地修改。

在下面的代码中,我删除了每个列表中不需要的部分(因为列表已排序,所以很容易),并且我尊重EADP编码风格(请求宽恕比许可更容易)

d={1:[1,6,16,32,50],2:[1,5,15],7:[6,7,9],13:[10,12,23,55]}

k = 15
for ki,li in d.items():
    try:
        x = next(x for x in li if x>=k)
    except:
        del d[ki]
    else:
        i = li.index(x)
        li[0:i] = []

print d
# {1: [16, 32, 50], 2: [15], 13: [23, 55]}

.

编辑 1

我更改了代码。这不是很好,因为我不得不迭代d.items()而不是d.iteritems(): 在最后一种情况下,在迭代期间不能修改字典。

.

编辑 2

我试过了bisect_left(),它确实是最快的解决方案。这是下面的第三个代码。第二个是更正 RussW 的一个。第一个是我以前的代码

k = 15

te = clock()
for jj in xrange(10000):
    d={1:[1,6,16,32,50],2:[1,5,15],7:[6,7,9],13:[10,12,23,55]}
    for ki,li in d.items():
        try:
            x = next(x for x in li if x>=k)
        except:
            del d[ki]
        else:
            i = li.index(x)
            li[0:i] = []
print clock() - te
print d
            
print '------------------------------------------'

d={1:[1,6,16,32,50],2:[1,5,15],7:[6,7,9],13:[10,12,23,55]}
te = clock()
for jj in xrange(10000):
    dct={1:[1,6,16,32,50],2:[1,5,15],7:[6,7,9],13:[10,12,23,55]}
    for key, lst in dct.items():
        gn = None
        for i, x in enumerate(lst):
            if x >= k:
                gn = i
                break
        if gn is None:
            del dct[key]
        else:
            dct[key] = lst[gn:]
print clock() - te
print dct
print '------------------------------------------'

te = clock()
for jj in xrange(10000):
    d={1:[1,6,16,32,50],2:[1,5,15],7:[6,7,9],13:[10,12,23,55]}
    for ki,li in d.items():

    i = bisect_left(li,15)
    if i==len(li):
        del d[ki]
    else:
        li[0:i] = []
print clock() - te
print d

结果

0.22918869577
{1: [16, 32, 50], 2: [15], 13: [23, 55]}
------------------------------------------
0.163871665254
{1: [16, 32, 50], 2: [15], 13: [23, 55]}
------------------------------------------
0.100142057161
{1: [16, 32, 50], 2: [15], 13: [23, 55]}
于 2013-07-26T08:02:50.580 回答
0
>>> def sieve(dct, n):
    for key, lst in dct.iteritems():
        gn = None
        for i, x in enumerate(lst):
            if x >= n:
                gn = i
                            break
        if gn is None:
            del dct[key]
        else:
            dct[key] = lst[gn:]


>>> d = {1:[1, 6, 16], 2:[1], 7:[6]}
>>> sieve(d, 15)
>>> d
{1: [16]}
>>> 
于 2013-07-26T07:19:22.207 回答
0
>>> import bisect
>>> d = {1: [1,6,16], 2: [1], 7: [6]}
>>> for k in d.keys():
...     d[k] = d[k][bisect.bisect_left(d[k], 16):]
...     if not d[k]:
...             del d[k]
...
>>> d
{1: [16]}
于 2013-07-26T07:23:04.233 回答