python - 读取文件行和字符

Question

我有一个看起来像这样的输入文件

some data...
some data...
some data...
...
some data...
<binary size="2358" width="32" height="24">
data of size 2358 bytes
</binary>
some data...
some data...

二进制大小中的值 2358 可以针对不同的文件进行更改。现在我想为这个文件（它是一个变量）提取 2358 字节的数据并写入另一个文件。

我为此编写了以下代码。但这给了我一个错误。问题是，我无法提取这 2358 字节的二进制数据并写入另一个文件。

c = responseFile.read(1)
ValueError: Mixing iteration and read methods would lose data

代码是 -

import re

outputFile = open('output', 'w')    
inputFile = open('input.txt', 'r')
fileSize=0
width=0
height=0

for line in inputFile:
    if "<binary size" in line:
        x = re.findall('\w+', line)
        fileSize = int(x[2])
        width = int(x[4])
        height = int(x[6])
        break

print x
# Here the file will point to the start location of 2358 bytes.
for i in range(0,fileSize,1):
    c = inputFile.read(1)
    outputFile.write(c)


outputFile.close()
inputFile.close()

我的问题的最终答案 -

#!/usr/local/bin/python

import os
inputFile = open('input', 'r')
outputFile = open('output', 'w')

flag = False

for line in inputFile:
    if line.startswith("<binary size"):
        print 'Start of Data'
        flag = True
    elif line.startswith("</binary>"):
        flag = False
        print 'End of Data'
    elif flag:
        outputFile.write(line) # remove newline

inputFile.close()
outputFile.close()

# I have to delete the last extra new line character from the output.
size = os.path.getsize('output')
outputFile = open('output', 'ab')
outputFile.truncate(size-1)
outputFile.close()

score 3 · Accepted Answer

不同的方法怎么样？在伪代码中：

for each line in input file:
    if line starts with binary tag: set output flag to True
    if line starts with binary-termination tag: set output flag to False
    if output flag is True: copy line to the output file

在实际代码中：

outputFile = open('./output', 'w')    
inputFile = open('./input.txt', 'r')

flag = False

for line in inputFile:

    if line.startswith("<binary size"):
        flag = True
    elif line.startswith("</binary>"):
        flag = False
    elif flag:
        outputFile.write(line[:-1]) # remove newline


outputFile.close()
inputFile.close()

score 2 · Accepted Answer

尝试将您的第一个循环更改为以下内容：

while True:
    line = inputFile.readline()
    # continue the loop as it was

这摆脱了迭代，只留下读取方法，所以问题应该消失。

score 1 · Accepted Answer

考虑这种方法：

import re

line = '<binary size="2358" width="32" height="24">'

m = re.search('size="(\d*)"', line)

print m.group(1)  # 2358

它因您的代码而异，因此它不是直接替代品，但正则表达式功能不同。

这使用了 Python 的正则表达式组捕获功能，并且比您的字符串拆分方法要好得多。

例如，考虑如果重新排序属性会发生什么。例如：

<binary width="32" size="2358" height="24">'
instead of
<binary size="2358" width="32" height="24">'

你的代码还能用吗？我的会。:-)

编辑：回答你的问题：

如果要从文件开头读取n个字节的数据，可以执行类似的操作

bytes = ifile.read(n)

请注意，如果输入文件不够长，您可能会得到少于n个字节。

如果您不想从“第 0”字节开始，而是从其他字节开始，seek()请先使用，如下所示：

ifile.seek(9)
bytes = ifile.read(5)

这会给你字节 9:13 或第 10 到第 14 个字节。

python - 读取文件行和字符

3 回答 3

Related

Reference