2

我有一个包含 1 或 0 的列表;没有其他的。我有兴趣找到 1,更具体地说,有兴趣找到 1 的运行开始位置和运行结束的位置(或在下面的代码中,1 运行的“长度”......它可以是“长度” “该运行的结束索引位置或该运行的结束索引位置,因为我可以做数学计算并从开始位置和结束位置计算出长度)。我将 1 的运行存储在哈希中。有没有比我拥有的更快的方法来获得我所追求的?我还在学习 python,我在现实生活中使用的列表要大得多,所以速度很重要。

previous = 0
cnt = 0
startLength = {} 
for r in listy: 
    if previous == 0 and r == 1:
        start = cnt
        startLength[start] = 1
    if previous == 1 and r == 1: 
        startLength[start] = 1 + cnt - start 
    previous = r
    cnt += 1

for s,l in startLength.iteritems():
    print "A run of 1's starts at position %s and lasts %s" % (s,l)
4

4 回答 4

7

我可能会用itertools.groupby这个

lst = [ 1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0]

from itertools import groupby
from operator import itemgetter

for k,v in groupby(enumerate(lst),key=itemgetter(1)):
    if k:
        v = list(v)
        print v[0][0],v[-1][0]

这将打印 1 组的开始和结束索引

于 2013-03-07T19:49:08.250 回答
2

除了@mgilson's pythonic answer之外,您还可以尝试以下操作:

lst = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1]

start, end = False, False

for i, x in enumerate(lst):
    if x == 1 and start is False:
        start = i
    if x == 0 and start is not False and end is False:
        end = i-1
    if start is not False and end is not False:
        print start, end  # and len is (end-start+1)
        start, end = False, False

if start is not False:
    print start, i

输出

0 4
12 15
22 23
于 2013-03-07T20:10:40.077 回答
1

这是一个稍微更有效的解决方案(对不起,它是 JavaScript)。关键是只存储一次长度,在您的代码中,每次长度增加一个“startLength [start] = 1 + cnt - start”时,您都会进行计算。

通过使用条件“if previous == 0 and r == 1”而不是“if previous == 1 and r == 1”。我减少了计算量,但我还必须在 for 循环之后添加一个“if r == 1”来捕获最终情况。

var test=[0,1,1,1,0,0,0,1,1,0,0,1,0];
function runs(arr) {
    var result = {};
    var start = 0;
    var previous = 0;
    var cnt = 0;
    var r = 0;
    for(; cnt<arr.length; cnt++) {
        var r = arr[cnt];
        if(r == 1 && previous == 0)
            start = cnt;
        if(r == 0 && previous == 1)
            result[start] = cnt - start;
        previous = r;
    }
    if(r == 1)
        result[start] = cnt - start;
    return result;
}
var result = runs(test);
for(var start in result)
    console.log("start " + start + " length " + result[start]);

编辑 2这是一个 python 基准测试,显示使用 groupby 函数(目前是这个问题的最佳答案)要慢得多。

from itertools import groupby
from operator import itemgetter
import random
import time

lst = [ 1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0]

def makeList(size):
    random.seed()
    return [random.randint(0,1) for r in xrange(size)]


def runs1(lst, showOutput):
    startLength = {}
    for k,v in groupby(enumerate(lst),key=itemgetter(1)):
        if k:
            v = list(v)
            startLength[v[0][0]] = v[-1][0] + 1 - v[0][0]
    if showOutput == True:
        for s,l in startLength.iteritems():
            print s,l

def runs2(lst, showOutput):
    previous = 0
    cnt = 0
    startLength = {} 
    for r in lst: 
        if previous == 0 and r == 1:
            start = cnt
        if previous == 1 and r == 0: 
            startLength[start] = cnt - start
        previous = r
        cnt += 1
    if r == 1:
        startLength[start] = cnt - start
    if showOutput == True:
        for s,l in startLength.iteritems():
            print s,l

testList = makeList(10)
print "slow version"
runs1(testList, True)
print "fast version"
runs2(testList, True)

benchmarkList = makeList(10000)

start = time.time()
runs1(benchmarkList, False)
print "slow ", time.time() - start
start = time.time()
runs2(benchmarkList, False)
print "fast ", time.time() - start

start = time.time()
runs1(benchmarkList, False)
print "slow ", time.time() - start
start = time.time()
runs2(benchmarkList, False)
print "fast ", time.time() - start

start = time.time()
runs1(benchmarkList, False)
print "slow ", time.time() - start
start = time.time()
runs2(benchmarkList, False)
print "fast ", time.time() - start
于 2013-03-07T20:50:43.793 回答
-1

如果您将列表转换为字符串,则可以为此使用正则表达式。

import re
import random

int_list = [random.randint(0,1) for i in xrange(1000000)]
runs = re.findall('1+', ''.join(map(str, int_list) # get a list of one-runs

# Print their lengths.
for run in runs:
    print len(run)

# If you really need to know the indexes where the runs are found, instead use
runs = re.finditer('1+', ''.join(map(str, int_list) # get a list of matches
for run in runs:
    print 'Start:\t%s' % run.start()
    print 'End:\t%s' % run.end()
    print 'Length:\t%s' % run.end()-run.start()
于 2013-03-07T19:46:56.250 回答