如果我理解正确的话很简单......
from itertools import groupby
data = "abbbccac"
print [(k, len(list(g))) for k,g in groupby(data)]
哇 - 将 Paul 的功能与我的非常相似的功能进行比较,得到了非常令人惊讶的结果。事实证明,加载列表的速度要快 10-100 倍。更令人惊讶的是,随着块变大,列表实现具有更大的优势。
我想这就是我 ♥ Python 的原因——它使简洁的表达式更好地工作——即使它有时看起来很神奇。
自己检查数据:
from itertools import groupby
from timeit import Timer
data = "abbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccac"
def rle_walk(gen):
ilen = lambda gen : sum(1 for x in gen)
return [(ch, ilen(ich)) for ch,ich in groupby(data)]
def rle_list(data):
return [(k, len(list(g))) for k,g in groupby(data)]
# randomy data
t = Timer('rle_walk("abbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccac")', "from __main__ import rle_walk; gc.enable()")
print t.timeit(1000)
t = Timer('rle_list("abbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccacabbbccac")', "from __main__ import rle_list; gc.enable()")
print t.timeit(1000)
# chunky blocks
t = Timer('rle_walk("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccccccccccccccccccccccccc")', "from __main__ import rle_walk; gc.enable()")
print t.timeit(1000)
t = Timer('rle_list("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccccccccccccccccccccccccc")', "from __main__ import rle_list; gc.enable()")
print t.timeit(1000)
给我的结果(越小越好):
1.42423391342 # lambda and walk - small blocks
0.145968914032 # make list - small blocks
1.41816806793 # lambda and walk - large blocks
0.0165541172028 # make list - large blocks