python - 确定空列表条目是否“连续”的 Pythonic 方法

Question

我正在寻找一种方法来轻松确定列表中的所有非 None 项是否都出现在单个连续切片中。 我将使用整数作为非 None 项的示例。

例如，该列表[None, None, 1, 2, 3, None, None]满足我对连续整数条目的要求。相比之下，[1, 2, None, None, 3, None]不是连续的，因为整数之间没有 None 条目。

更多示例以使这一点尽可能清楚。

连续：
[1, 2, 3, None, None]
[None, None, 1, 2, 3]
[None, 1, 2, 3, None]

不连续：
[None, 1, None, 2, None, 3]
[None, None, 1, None, 2, 3]
[1, 2, None, 3, None, None]

我的第一种方法是使用变量来跟踪我们是否遇到过None，以及我们是否遇到过int尚未——这最终会产生高度嵌套且非常难以遵循的一系列 if/else嵌入在 for 循环中的语句。（除了丑陋之外，我承认我并没有让它在每种情况下都能正常工作）。

任何人都知道一种更简单的方法来确定列表中的非无项是否出现在单个连续切片中？

score 44 · Accepted Answer

def contiguous(seq):
    seq = iter(seq)
    all(x is None for x in seq)        # Burn through any Nones at the beginning
    any(x is None for x in seq)        # and the first group
    return all(x is None for x in seq) # everthing else (if any) should be None.

这里有几个例子。您可以使用next(seq)从迭代器中获取下一项。我会在每个之后放置一个指向下一个项目的标记

示例1：

seq = iter([None, 1, 2, 3, None])        #  [None, 1, 2, 3, None]
                                         # next^
all(x is None for x in seq)            
                                         #        next^
any(x is None for x in seq)            
                                         #                    next^ (off the end)
return all(x is None for x in seq)       # all returns True for the empty sequence

示例2：

seq = iter([1, 2, None, 3, None, None])  #    [1, 2, None, 3, None, None]
                                         # next^
all(x is None for x in seq)            
                                         #    next^
any(x is None for x in seq)            
                                         #             next^  
return all(x is None for x in seq)       # all returns False when 3 is encountered

score 25 · Accepted Answer

很好itertools.groupby的救援：

from itertools import groupby

def contiguous(seq):
    return sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) == 1

给

>>> contiguous([1,2,3,None,None])
True
>>> contiguous([None, 1,2,3,None])
True
>>> contiguous([None, None, 1,2,3])
True
>>> contiguous([None, 1, None, 2,3])
False
>>> contiguous([None, None, 1, None, 2,3])
False
>>> contiguous([None, 1, None, 2, None, 3])
False
>>> contiguous([1, 2, None, 3, None, None])
False

[编辑]

由于评论中似乎有一些讨论，我将解释为什么我比其他一些方法更喜欢这种方法。

我们试图找出是否存在一组连续的非 None 对象，并且

sum(1 for k,g in groupby(seq, lambda x: x is not None) if k)

使用stdlib中的函数计算连续的非None对象的数量，该函数旨在收集连续的组。一看到groupby，我们就认为是“连续组”，反之亦然。从这个意义上说，它是自我记录的。这基本上是我的目标的定义。

恕我直言，唯一的弱点是它不会短路，这可以修复，但经过思考后，我仍然更喜欢它，因为它使用了我喜欢的原语——“计算连续非无组的数量” - 我更喜欢简单地“尽快告诉我是否有多个连续的非无组”。

实现最后一个的许多方法都依赖于对问题的巧妙观察，例如“如果只有一组连续的非无对象，那么如果我们扫描直到找到第一个非无对象，然后扫描对象直到我们找到第一个非 None 组（如果存在），那么剩下的是否为 None 给了我们答案。” （或者类似的东西，这是我的问题的一部分：我必须考虑一下。）对我来说，这感觉就像使用有关问题的“实施细节”来解决它，并专注于我们可以用来解决的问题的属性它，而不是简单地将问题指定给 Python 并让 Python 完成工作。

正如俗话说的那样，我是一只大脑很少的熊，我喜欢避免变得聪明，因为根据我的经验，这是一条充满失败的路线。

与往常一样，当然，每个人的里程可能会有所不同，而且可能与他们的聪明程度成正比。

score 12 · Accepted Answer

你可以使用类似的东西itertools.groupby：

from itertools import groupby

def are_continuous(items):
    saw_group = False

    for group, values in groupby(items, lambda i: i is not None):
        if group:
            if saw_group:
                return False
            else:
                saw_group = True

    return True

这只会迭代直到它看到一个组两次。我不确定您是否考虑[None, None]，因此请根据您的需要进行调整。

score 7 · Accepted Answer

这可能不是最好的方法，但您可以查找第一个非 None 条目和最后一个non-None条目，然后检查切片是否存在None. 例如：

def is_continuous(seq):
    try:
        first_none_pos = next(i for i,x in enumerate(seq) if x is not None)
        #need the or None on the next line to handle the case where the last index is `None`.
        last_none_pos = -next(i for i,x in enumerate(reversed(seq)) if x is not None) or None
    except StopIteration: #list entirely of `Nones`
        return False
    return None not in seq[first_none_pos:last_none_pos]

assert is_continuous([1,2,3,None,None]) == True
assert is_continuous([None, 1,2,3,None]) == True
assert is_continuous([None, None, 1,2,3]) == True
assert is_continuous([None, 1, None, 2,3]) == False
assert is_continuous([None, None, 1, None, 2,3]) == False
assert is_continuous([None, 1, None, 2, None, 3]) == False
assert is_continuous([1, 2, None, 3, None, None]) == False

这适用于任何序列类型。

score 7 · Accepted Answer

使用序列元素的自然方式是使用dropwhile：

from itertools import dropwhile
def continuous(seq):
    return all(x is None for x in dropwhile(lambda x: x is not None,
                                            dropwhile(lambda x: x is None, seq)))

我们可以在没有嵌套函数调用的情况下表达这一点：

from itertools import dropwhile
def continuous(seq):
    core = dropwhile(lambda x: x is None, seq)
    remainder = dropwhile(lambda x: x is not None, core)
    return all(x is None for x in remainder)

score 5 · Accepted Answer

一个班轮：

contiguous = lambda l: ' ' not in ''.join('x '[x is None] for x in l).strip()

真正的工作是由strip函数完成的。如果剥离的字符串中有空格，则它们不是前导/尾随。该函数的其余部分将列表转换为字符串，每个字符串都有一个空格None。

score 3 · Accepted Answer

我做了一些分析以将@gnibbler 的方法与 groupby 方法进行比较。@gnibber 的方法始终更快，尤其是。对于更长的列表。例如，对于长度为 3-100 的随机输入，我看到大约 50% 的性能增益，有 50% 的机会包含单个 int 序列（随机选择），否则包含随机值。测试代码如下。我穿插了这两种方法（随机选择先使用哪一种）以确保消除任何缓存效果。基于此，我想说虽然 groupby 方法更直观，但如果分析表明这是要优化的整体代码的重要部分，@gnibber 的方法可能是合适的——在这种情况下，应该使用适当的注释来指示使用 all/any 到消费者迭代器值的情况。

from itertools import groupby
import random, time

def contiguous1(seq):
    # gnibber's approach
    seq = iter(seq)
    all(x is None for x in seq)        # Burn through any Nones at the beginning
    any(x is None for x in seq)        # and the first group
    return all(x is None for x in seq) # everthing else (if any) should be None.

def contiguous2(seq):
    return sum(1 for k,g in groupby(seq, lambda x: x is not None) if k) == 1

times = {'contiguous1':0,'contiguous2':0}

for i in range(400000):
    n = random.randint(3,100)
    items = [None] * n
    if random.randint(0,1):
        s = random.randint(0,n-1)
        e = random.randint(0,n-s)
        for i in range(s,e):
            items[i] = 3
    else:
        for i in range(n):
            if not random.randint(0,2):
                items[i] = 3
    if random.randint(0,1):
        funcs = [contiguous1, contiguous2]
    else:
        funcs = [contiguous2, contiguous1]
    for func in funcs:
        t0 = time.time()
        func(items)
        times[func.__name__] += (time.time()-t0)

print
for f,t in times.items():
    print '%10.7f %s' % (t, f)

score 2 · Accepted Answer

这是一个受 numpy 启发的解决方案。获取所有非空元素的数组索引。然后，将每个索引与其后面的索引进行比较。如果差值大于 1，则非空值之间存在空值。如果没有下一个索引大于 1 的索引，则没有间隙。

def is_continuous(seq):
    non_null_indices = [i for i, obj in enumerate(seq) if obj is not None]
    for i, index in enumerate(non_null_indices[:-1]):
        if non_null_indices[i+1] - index > 1:
            return False
    return True

score 1 · Accepted Answer

这个算法有一些缺点（它从列表中删除项目）。但这是一个解决方案。

基本上，如果您None从开始和结束中删除所有连续的。如果您None在列表中找到一些整数，则整数不是连续形式。

def is_continuous(seq):
    while seq and seq[0] is None: del seq[0]
    while seq and seq[-1] is None: del seq[-1]

    return None not in seq

assert is_continuous([1,2,3,None,None]) == True
assert is_continuous([None, 1,2,3,None]) == True
assert is_continuous([None, None, 1,2,3]) == True
assert is_continuous([None, 1, None, 2,3]) == False
assert is_continuous([None, None, 1, None, 2,3]) == False
assert is_continuous([None, 1, None, 2, None, 3]) == False
assert is_continuous([1, 2, None, 3, None, None]) == False

然而，小代码如何变得邪恶的另一个例子。

我希望有一种strip()方法可用于list.

score 1 · Accepted Answer

我的第一种方法是使用变量来跟踪......

...这最终导致高度嵌套且非常难以遵循嵌入在 for 循环中的一系列 if/else 语句...

不！实际上你只需要一个变量。用你的方法从有限状态机（FSM）的角度思考这个问题将导致一个非常好的解决方案。

我们称之为状态p。起初，p是 0。然后我们开始在状态之间行走。

密克罗尼西亚联邦

当检查列表中的所有元素并且仍然没有失败时，答案是True.

一种在字典中编码翻译表的版本

def contiguous(s, _D={(0,0):0, (0,1):1, (1,0):2, (1,1):1, (2,0):2, (2,1):3}):
    p = 0
    for x in s:
        p = _D[p, int(x is not None)]
        if p >= 3: return False
    return True

另一个使用 if 语句的版本：

def contiguous(s):
    p = 0
    for x in s:
        if x is None and p == 1 or x is not None and (p == 0 or p == 2):
            p += 1
        if p >= 3: return False
    return True

所以我的观点是使用if并且for仍然是pythonic。

更新

我找到了另一种编码 FSM 的方法。我们可以将转换表打包成一个 12 位整数。

def contiguous(s):
    p = 0
    for x in s:
        p = (3684 >> (4*p + 2*(x!=None))) & 3
        if p >= 3: return False
    return True

这里的幻数 3684 可以通过以下方式获得：

    _D[p,a]     3  2  1  2  1  0
         p      2  2  1  1  0  0
         a      1  0  1  0  1  0
bin(3684) = 0b 11 10 01 10 01 00

可读性不如其他版本，但它更快，因为它避免了字典查找。第二个版本和这个一样快，但是这种编码思想可以推广到解决更多问题。

score 0 · Accepted Answer

这是一种仅使用 numpy 的方法：

a = np.array([1, 2, 3, np.nan, 4, 5, np.nan, 6, 7])

# This returns indices of nans
# eg. [[3], [6]]
# use .squeeze() to convert to [3, 6]
aa = np.argwhere(a != a).squeeze()

# use a diff on your array , if the nans
# are continuous, the diff will always be 1
# if not, diff will be > 1 , and using any() will return True
any(np.diff(aa) > 1)

python - 确定空列表条目是否“连续”的 Pythonic 方法

11 回答 11

更新

Related

Reference