-1

我是 Python 的新手。我通过日志有带有重复字符串的大文件

例子:

abc
def
efg
gjk
abc
def
efg
gjk
abc
def
efg
gjk
abc
def
efg
gjk

预期结果

--------------------Section1---------------------------
abc
def
efg
gjk
--------------------Section2---------------------------
abc
def
efg
gjk
--------------------Section3---------------------------
abc
def
efg
gjk
--------------------Section4---------------------------
abc
def
efg
gjk

有人可以为我提供继续此操作的指示。我为特定的字符串尝试了 grep,它只给了我特定顺序的字符串。我希望从 abc 到 gjk 的整个日志放在一个部分中。

4

3 回答 3

2

如果一个部分由起始行定义,您可以使用生成器函数从输入迭代中产生部分:

def per_section(iterable):
    section = []
    for line in iterable:
        if line.strip() == 'abc':
            # start of a section, yield previous
            if section:
                yield section
            section = []

        section.append(line)

    # lines done, yield last
    if section:
        yield section

将此与输入文件一起使用,例如:

with open('somefile') as inputfile:
    for i, section in enumerate(per_section(inputfile)):
        print '------- section {} ---------'.format(i)
        print ''.join(section)

如果部分仅基于行数,请使用itertoolsgrouper 配方将输入可迭代分组为固定长度的组:

from itertools import izip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

with open('somefile') as inputfile:
    for i, section in enumerate(grouper(inputfile, 4, '\n')):
        print '------- section {} ---------'.format(i)
        print ''.join(section)
于 2013-08-04T08:14:15.527 回答
1

为简单起见(如您所说:每 4 行一条记录):

with open ('yourfile', 'r') as f: lines = [x for x in f]
while lines:
    print ('----------------------------------')
    print (lines [:4] )
    lines = lines [4:]
于 2013-08-04T09:00:17.580 回答
0

由于您有一个已知的起点,因此您可以在看到部分开始时触发部分标题:

>>> section = 0
>>> with open('bigdata.txt') as f:
        for line in f:
            if 'abc' in line:
                section += 1
                print ('Section' + str(section)).center(55, '-')
            print line


------------------------Section1-----------------------
abc
def
efg
gjk
------------------------Section2-----------------------
abc
def
efg
gjk
------------------------Section3-----------------------
abc
def
efg
gjk
------------------------Section4-----------------------
abc
def
efg
gjk
于 2013-08-04T08:12:02.433 回答