0

I have an auto-generated info file coming from a measurement. It consists of both binary as well as human readable parts. I want to extract some of the non binary meta data. For some files, I am not able to get to the meta data, as the readlines() does not yield the whole file. I guess that the file contains some EOF char. I can open the file in notepad++ without problems.

A possible solution to this problem would be to read in the file binary and parse it to char afterwards, deleting the EOF char while doing so. Anyhow, I wonder if there is a more elegant way to do so?

Edit: The question was rightfully downvoted, I should have provided code. I actually use

f = open(fname, 'r')
raw = f.readlines()

and then proceed with walking through the list. The EOF chars that are existing (depending on the OS) seem to cause the havoc I am observing. I will accept the answer that states using the binary 'rb' flag. By the way, this was an impressive response time! (-:

4

2 回答 2

5
with open(afile,"rb") as f: print f.readlines()

这样做有什么问题?

如果您不以二进制模式打开文件,一些非 ASCII 字符会被错误地解释和/或丢弃......如果它与二进制数据混合,可能会无意中删除一些 ASCII

于 2014-05-26T13:40:28.367 回答
0

You can use the read() function of the file object. It reads the whole file.

with open('input.bin', 'r') as f:
    content = f.read()

Then you can parse the content. If you know where the part you need starts, you can seek to it (e.g. if the file has a fixed-length binary start):

with open('input.bin', 'r') as f:
    f.seek(CONTENT_START)
    content = f.read()

On Windows, you should change the reading mode to 'rb', to indicate that you want to read the file in binary mode; only then line endings in the text-part may consist of '\r\n', depending on how you created the file in the first place.

于 2014-05-26T13:43:46.567 回答