我正在寻找一种编写生成器的好方法,该生成器从另一个列表/生成器/迭代中获取项目流并将它们分组。
拆分项目很容易。例如,如果我们想要获取文件的行并将它们拆分为字符:
def lines2chars(filename):
with open(filename) as fh:
for line in fh: # Iterate over items
for char in line: # Split items up
yield char # Yield smaller items
例如,将它们分组以生成段落似乎很棘手。这就是我想出的:
def lines2para(filename):
with fh as open(filename):
paragraph = [] # Start with an empty group
while True: # Infinite loop to be ended by exception
try:
line = next(fh) # Get a line
except StopIteration as e:
# If there isn't one...
# do whatever necessary
raise # and raise StopIteration for the caller
else:
paragraph.append(line) # Add to the group of items
if line == "\n": # If we've got a whole group
yield paragraph # yield it
paragraph = [] # and start a new group
在我看来这并不漂亮。它使用了迭代协议的内部结构,有一个被打破的无限循环,对我来说读起来不太好。那么有没有人有更好的方法来编写这种类型的代码?
请记住,我正在寻找模式,而不是这个特定的例子。在我的情况下,我正在读取跨数据包拆分的数据包,但每个级别都类似于段落示例。