我有以下格式的一些数据:
data = """
[Data-0]
Data = BATCH
BatProtocol = DIAG-ST
BatCreate = 20010724
[Data-1]
Data = SAMP
SampNum = 357
SampLane = 1
[Data-2]
Data = SAMP
SampNum = 357
SampLane = 2
[Data-9]
Data = BATCH
BatProtocol = VCA
BatCreate = 20010725
[Data-10]
Data = SAMP
SampNum = 359
SampLane = 1
[Data-11]
Data = SAMP
SampNum = 359
SampLane = 2
"""
结构是:
[Data-x]
其中 x 是一个数字Data =
其次是BATCH
或SAMPLE
- 多几行
我正在尝试编写一个为每个“批次”生成一个列表的函数。列表的第一项是包含该行的文本块,Data = BATCH
列表中的以下项是包含该行的文本块Data = SAMP
。我目前有
def get_batches(data):
textblocks = iter([txt for txt in data.split('\n\n') if txt.strip()])
batch = []
sample = next(textblocks)
while True:
if 'BATCH' in sample:
batch.append(sample)
sample = next(textblocks)
if 'BATCH' in sample:
yield batch
batch = []
else:
batch.append(sample)
如果这样调用:
batches = get_batches(data)
for batch in batches:
print batch
print '_' * 20
但是,它只返回第一个“批次”:
['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724',
'[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1',
'[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________
而我的预期输出将是:
['[Data-0]\nData = BATCH\nBatProtocol = DIAG-ST\nBatCreate = 20010724',
'[Data-1]\nData = SAMP\nSampNum = 357\nSampLane = 1',
'[Data-2]\nData = SAMP\nSampNum = 357\nSampLane = 2']
____________________
['[Data-9]\nData = BATCH\nBatProtocol = VCA\nBatCreate = 20010725',
'[Data-10]\nData = SAMP\nSampNum = 359\nSampLane = 1',
'[Data-11]\nData = SAMP\nSampNum = 359\nSampLane = 2']
____________________
我缺少什么或如何改进我的功能?