python - 分组生成器的 Pythonic 模式

Question

我正在寻找一种编写生成器的好方法，该生成器从另一个列表/生成器/迭代中获取项目流并将它们分组。

拆分项目很容易。例如，如果我们想要获取文件的行并将它们拆分为字符：

def lines2chars(filename):

    with open(filename) as fh:

        for line in fh:                 # Iterate over items
            for char in line:           # Split items up
                yield char              # Yield smaller items

例如，将它们分组以生成段落似乎很棘手。这就是我想出的：

def lines2para(filename):

    with fh as open(filename):
        paragraph = []                  # Start with an empty group

        while True:                     # Infinite loop to be ended by exception
            try:
                line = next(fh)         # Get a line
            except StopIteration as e:
                                        # If there isn't one...
                                        # do whatever necessary
                raise                   # and raise StopIteration for the caller
            else:
                paragraph.append(line)  # Add to the group of items
                if line == "\n":        # If we've got a whole group
                    yield paragraph     # yield it
                    paragraph = []      # and start a new group

在我看来这并不漂亮。它使用了迭代协议的内部结构，有一个被打破的无限循环，对我来说读起来不太好。那么有没有人有更好的方法来编写这种类型的代码？

请记住，我正在寻找模式，而不是这个特定的例子。在我的情况下，我正在读取跨数据包拆分的数据包，但每个级别都类似于段落示例。

score 1 · Accepted Answer

import itertools as it

def lines2para(filename):
    with open(filename) as fh:
        for k, v in it.groupby(fh, key=lambda x: bool(x.strip())):
            if k:
                yield list(v)

python - 分组生成器的 Pythonic 模式

1 回答 1

Related

Reference