1

我正在尝试从文件中读取特定行并在结束每个块的过程后继续读取。假设我在一个文件中有 19000 行。每次,我都会提取前 19 行,用这些行进行一些计算,然后将输出写入另一个文件。然后我将再次提取接下来的 19 行并进行相同的处理。因此,我尝试通过以下方式提取行:

n=19
x = defaultdict(list)

i=0

fp = open("file")
for next_n_lines in izip_longest(*[fp] *n):
    lines = next_n_lines

    for i, line in enumerate(lines): 
        do calculation
    write results

这里的代码适用于第一个块。你们中的任何人都可以帮助我,我该如何继续下一个 n 块?提前非常感谢!

4

3 回答 3

3

您的代码已经提取了 19 行一组的行,所以我不确定您的问题是什么。

我可以稍微清理您的解决方案,但它与您的代码执行相同的操作:

from itertools import izip_longest

# grouping recipe from itertools documentation
def grouper(n, iterable, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

def process_chunk(chunk):
    "Return sequence of result lines.  Chunk must be iterable."
    for i, line in enumerate(chunk):
        yield 'file-line {1:03d}; chunk-line {0:02d}\n'.format(i, int(line))
    yield '----------------------------\n'

下面是一些测试代码,展示了每一行都被访问过:

from StringIO import StringIO

class CtxStringIO(StringIO):
    def __enter__(self):
        return self
    def __exit__(self, *args):
        return False

infile = CtxStringIO(''.join('{}\n'.format(i) for i in xrange(19*10)))
outfile = CtxStringIO()


# this should be the main loop of your program.
# just replace infile and outfile with real file objects
with infile as ifp, outfile as ofp:
    for chunk in grouper(19, ifp, '\n'):
        ofp.writelines(process_chunk(chunk))

# see what was written to the file
print ofp.getvalue()

此测试用例应打印如下行:

file-line 000; chunk-line 00
file-line 001; chunk-line 01
file-line 002; chunk-line 02
file-line 003; chunk-line 03
file-line 004; chunk-line 04
...
file-line 016; chunk-line 16
file-line 017; chunk-line 17
file-line 018; chunk-line 18
----------------------------
file-line 019; chunk-line 00
file-line 020; chunk-line 01
file-line 021; chunk-line 02
...
file-line 186; chunk-line 15
file-line 187; chunk-line 16
file-line 188; chunk-line 17
file-line 189; chunk-line 18
----------------------------
于 2013-04-29T16:28:23.067 回答
2

此解决方案不需要加载内存中的所有行。

n=19
fp = open("file")
next_n_lines = []
for line in fp:
    next_n_lines.append(line)
    if len(next_n_lines) == n:
        do caculation
        next_n_lines = []
if len(next_n_lines) > 0:
    do caculation
write results
于 2013-04-29T16:11:13.510 回答
2

您的问题尚不清楚,但我猜您所做的计算取决于您提取的所有 N 行(在您的示例中为 19 行)。

所以最好提取所有这些行然后做这个工作:

N = 19
inFile = open('myFile')
i = 0
lines = list()

for line in inFile:
    lines.append(line)
    i += 1
    if i == N:
        # Do calculations and save on output file
        lines = list()
        i = 0
于 2013-04-29T16:06:41.570 回答