5

假设我有一个元组生成器,我模拟如下:

g = (x for x in (1,2,3,97,98,99))

对于这个特定的生成器,我希望编写一个函数来输出以下内容:

(1,2,3)
(2,3,97)
(3,97,98)
(97,98,99)
(98,99)
(99)

所以我一次迭代三个连续的项目并打印它们,除非我接近尾声。

我的函数的第一行应该是:

t = tuple(g)

换句话说,最好直接处理元组还是使用生成器可能有益。如果可以同时使用这两种方法来解决这个问题,请说明两种方法的优缺点。此外,如果使用生成器方法可能是明智的,那么这样的解决方案看起来如何?

这是我目前所做的:

def f(data, l):
    t = tuple(data)
    for j in range(len(t)):
        print(t[j:j+l])

data = (x for x in (1,2,3,4,5))
f(data,3)

更新

请注意,我已经更新了我的函数以采用第二个参数来指定窗口的长度。

4

7 回答 7

3

实际上,在itertools模块中有这样的功能- tee()izip_longest()

>>> from itertools import izip_longest, tee
>>> g = (x for x in (1,2,3,97,98,99))
>>> a, b, c = tee(g, 3)
>>> next(b, None)
>>> next(c, None)
>>> next(c, None)
>>> [[x for x in l if x is not None] for l in izip_longest(a, b, c)]
[(1, 2, 3), (2, 3, 97), (3, 97, 98), (97, 98, 99), (98, 99), (99)]

来自文档:

Return n independent iterators from a single iterable. Equivalent to:

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:             # when the local deque is empty
                newval = next(it)       # fetch a new value and
                for d in deques:        # load it to all the deques
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)
于 2013-09-20T10:45:07.980 回答
3

返回三个项目的具体示例可以阅读

def yield3(gen):
    b, c = gen.next(), gen.next()
    try:
        while True:
            a, b, c = b, c, gen.next()
            yield (a, b, c)
    except StopIteration:
        yield (b, c)
        yield (c,)


g = (x for x in (1,2,3,97,98,99))
for l in yield3(g):
    print l
于 2013-09-20T10:38:28.737 回答
2

如果您可能需要一次获取三个以上的元素,并且您不想将整个生成器加载到内存中,我建议使用标准库中模块中的 adequecollections存储当前的项目集。A deque(发音为“deck”,意思是“双端队列”)可以有效地从两端推送和弹出值。

from collections import deque
from itertools import islice

def get_tuples(gen, n):
    q = deque(islice(gen, n))   # pre-load the queue with `n` values
    while q:                    # run until the queue is empty
        yield tuple(q)          # yield a tuple copied from the current queue
        q.popleft()             # remove the oldest value from the queue
        try:
            q.append(next(gen)) # try to add a new value from the generator
        except StopIteration:
            pass                # but we don't care if there are none left
于 2013-09-20T10:55:34.800 回答
1

It's definitely best to work with the generator because you don't want to have to hold everything in memory.

It can be done very simply with a deque.

from collections import deque
from itertools import islice

def overlapping_chunks(size, iterable, *, head=False, tail=False):
    """
    Get overlapping subsections of an iterable of a specified size.

        print(*overlapping_chunks(3, (1,2,3,97,98,99)))
        #>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]

    If head is given, the "warm up" before the specified maximum
    number of items is included.

        print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
        #>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]

    If head is truthy, the "warm up" before the specified maximum
    number of items is included.

        print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
        #>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]

    If tail is truthy, the "cool down" after the iterable is exhausted
    is included.

        print(*overlapping_chunks(3, (1,2,3,97,98,99), tail=True))
        #>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99] [98, 99] [99]
    """

    chunker = deque(maxlen=size)
    iterator = iter(iterable)

    for item in islice(iterator, size-1):
        chunker.append(item)

        if head:
            yield list(chunker)

    for item in iterator:
        chunker.append(item)
        yield list(chunker)

    if tail:
        while len(chunker) > 1:
            chunker.popleft()
            yield list(chunker) 
于 2013-09-20T10:46:11.353 回答
1

实际上这取决于。

在非常大的集合的情况下,生成器可能很有用,您实际上不需要将它们全部存储在内存中以获得所需的结果。另一方面,您必须打印它似乎可以肯定地猜测该集合并不庞大,因此它不会有所作为。

但是,这是一个可以实现您想要的生成器

def part(gen, size):
    t = tuple()
    try:
        while True:
        l = gen.next()
        if len(t) < size:
            t = t + (l,)
            if len(t) == size:
                yield t
            continue
        if len(t) == size:
            t = t[1:] + (l,)
            yield t
            continue
    except StopIteration:
        while len(t) > 1:
        t = t[1:]
        yield t

>>> a = (x for x in range(10))
>>> list(part(a, 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9), (9,)]
>>> a = (x for x in range(10))
>>> list(part(a, 5))
[(0, 1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), (3, 4, 5, 6, 7), (4, 5, 6, 7, 8), (5, 6, 7, 8, 9), (6, 7, 8, 9), (7, 8, 9), (8, 9), (9,)]
>>> 

注意:代码实际上不是很优雅,但是当你必须分成 5 块时它也可以工作

于 2013-09-20T10:39:29.210 回答
0

我认为您目前所做的似乎比上述任何一项都容易得多。如果没有任何特别需要使它更复杂,我的意见是保持简单。换句话说,最好直接处理一个元组。

于 2013-09-20T10:55:13.407 回答
0

这是一个适用于 Python 2.7.17 和 3.8.1 的生成器。在内部,它尽可能使用迭代器和生成器,因此它应该具有相对的内存效率。

try:
    from itertools import izip, izip_longest, takewhile
except ImportError:  # Python 3
    izip = zip
    from itertools import zip_longest as izip_longest, takewhile

def tuple_window(n, iterable):
    iterators = [iter(iterable) for _ in range(n)]
    for n, iterator in enumerate(iterators):
        for _ in range(n):
            next(iterator)
    _NULL = object()  # Unique singleton object.
    for t in izip_longest(*iterators, fillvalue=_NULL):
        yield tuple(takewhile(lambda v: v is not _NULL, t))

if __name__ == '__main__':
    data = (1, 2, 3, 97, 98, 99)
    for t in tuple_window(3, data):
        print(t)

输出:

(1, 2, 3)
(2, 3, 97)
(3, 97, 98)
(97, 98, 99)
(98, 99)
(99,)
于 2013-09-20T15:34:54.620 回答