python - 使用生成器时访问连续项目

Question

假设我有一个元组生成器，我模拟如下：

g = (x for x in (1,2,3,97,98,99))

对于这个特定的生成器，我希望编写一个函数来输出以下内容：

(1,2,3)
(2,3,97)
(3,97,98)
(97,98,99)
(98,99)
(99)

所以我一次迭代三个连续的项目并打印它们，除非我接近尾声。

我的函数的第一行应该是：

t = tuple(g)

换句话说，最好直接处理元组还是使用生成器可能有益。如果可以同时使用这两种方法来解决这个问题，请说明两种方法的优缺点。此外，如果使用生成器方法可能是明智的，那么这样的解决方案看起来如何？

这是我目前所做的：

def f(data, l):
    t = tuple(data)
    for j in range(len(t)):
        print(t[j:j+l])

data = (x for x in (1,2,3,4,5))
f(data,3)

更新：

请注意，我已经更新了我的函数以采用第二个参数来指定窗口的长度。

score 3 · Accepted Answer

实际上，在itertools模块中有这样的功能- tee()和izip_longest()：

>>> from itertools import izip_longest, tee
>>> g = (x for x in (1,2,3,97,98,99))
>>> a, b, c = tee(g, 3)
>>> next(b, None)
>>> next(c, None)
>>> next(c, None)
>>> [[x for x in l if x is not None] for l in izip_longest(a, b, c)]
[(1, 2, 3), (2, 3, 97), (3, 97, 98), (97, 98, 99), (98, 99), (99)]

来自文档：

Return n independent iterators from a single iterable. Equivalent to:

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:             # when the local deque is empty
                newval = next(it)       # fetch a new value and
                for d in deques:        # load it to all the deques
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

score 3 · Accepted Answer

返回三个项目的具体示例可以阅读

def yield3(gen):
    b, c = gen.next(), gen.next()
    try:
        while True:
            a, b, c = b, c, gen.next()
            yield (a, b, c)
    except StopIteration:
        yield (b, c)
        yield (c,)


g = (x for x in (1,2,3,97,98,99))
for l in yield3(g):
    print l

score 2 · Accepted Answer

如果您可能需要一次获取三个以上的元素，并且您不想将整个生成器加载到内存中，我建议使用标准库中模块中的 adeque来collections存储当前的项目集。A deque（发音为“deck”，意思是“双端队列”）可以有效地从两端推送和弹出值。

from collections import deque
from itertools import islice

def get_tuples(gen, n):
    q = deque(islice(gen, n))   # pre-load the queue with `n` values
    while q:                    # run until the queue is empty
        yield tuple(q)          # yield a tuple copied from the current queue
        q.popleft()             # remove the oldest value from the queue
        try:
            q.append(next(gen)) # try to add a new value from the generator
        except StopIteration:
            pass                # but we don't care if there are none left

score 1 · Accepted Answer

It's definitely best to work with the generator because you don't want to have to hold everything in memory.

It can be done very simply with a deque.

from collections import deque
from itertools import islice

def overlapping_chunks(size, iterable, *, head=False, tail=False):
    """
    Get overlapping subsections of an iterable of a specified size.

        print(*overlapping_chunks(3, (1,2,3,97,98,99)))
        #>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]

    If head is given, the "warm up" before the specified maximum
    number of items is included.

        print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
        #>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]

    If head is truthy, the "warm up" before the specified maximum
    number of items is included.

        print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
        #>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]

    If tail is truthy, the "cool down" after the iterable is exhausted
    is included.

        print(*overlapping_chunks(3, (1,2,3,97,98,99), tail=True))
        #>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99] [98, 99] [99]
    """

    chunker = deque(maxlen=size)
    iterator = iter(iterable)

    for item in islice(iterator, size-1):
        chunker.append(item)

        if head:
            yield list(chunker)

    for item in iterator:
        chunker.append(item)
        yield list(chunker)

    if tail:
        while len(chunker) > 1:
            chunker.popleft()
            yield list(chunker)

score 1 · Accepted Answer

实际上这取决于。

在非常大的集合的情况下，生成器可能很有用，您实际上不需要将它们全部存储在内存中以获得所需的结果。另一方面，您必须打印它似乎可以肯定地猜测该集合并不庞大，因此它不会有所作为。

但是，这是一个可以实现您想要的生成器

def part(gen, size):
    t = tuple()
    try:
        while True:
        l = gen.next()
        if len(t) < size:
            t = t + (l,)
            if len(t) == size:
                yield t
            continue
        if len(t) == size:
            t = t[1:] + (l,)
            yield t
            continue
    except StopIteration:
        while len(t) > 1:
        t = t[1:]
        yield t

>>> a = (x for x in range(10))
>>> list(part(a, 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9), (9,)]
>>> a = (x for x in range(10))
>>> list(part(a, 5))
[(0, 1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), (3, 4, 5, 6, 7), (4, 5, 6, 7, 8), (5, 6, 7, 8, 9), (6, 7, 8, 9), (7, 8, 9), (8, 9), (9,)]
>>>

注意：代码实际上不是很优雅，但是当你必须分成 5 块时它也可以工作

score 0 · Accepted Answer

我认为您目前所做的似乎比上述任何一项都容易得多。如果没有任何特别需要使它更复杂，我的意见是保持简单。换句话说，最好直接处理一个元组。

score 0 · Accepted Answer

这是一个适用于 Python 2.7.17 和 3.8.1 的生成器。在内部，它尽可能使用迭代器和生成器，因此它应该具有相对的内存效率。

try:
    from itertools import izip, izip_longest, takewhile
except ImportError:  # Python 3
    izip = zip
    from itertools import zip_longest as izip_longest, takewhile

def tuple_window(n, iterable):
    iterators = [iter(iterable) for _ in range(n)]
    for n, iterator in enumerate(iterators):
        for _ in range(n):
            next(iterator)
    _NULL = object()  # Unique singleton object.
    for t in izip_longest(*iterators, fillvalue=_NULL):
        yield tuple(takewhile(lambda v: v is not _NULL, t))

if __name__ == '__main__':
    data = (1, 2, 3, 97, 98, 99)
    for t in tuple_window(3, data):
        print(t)

输出：

(1, 2, 3)
(2, 3, 97)
(3, 97, 98)
(97, 98, 99)
(98, 99)
(99,)

python - 使用生成器时访问连续项目

7 回答 7

Related

Reference