20

我还没有找到这个特定问题的答案;也许没有。但我一直在想这件事。

当您在文本编辑器中查看二进制文件时,究竟是什么导致二进制文件显示为“乱码”?加密文件也是如此。文件的二进制值是否试图转换为 ASCII?是否可以将视图转换为显示原始二进制值,即显示构成文件的 1 和 0?

最后,有没有办法确定哪个程序可以正确打开数据文件?很多时候,尤其是在 Windows 中,文件是孤立的,或者与特定程序无关。在文本编辑器中打开它有时会告诉你它属于哪里,但由于乱码,大多数时候不会。如果扩展程序不提供任何信息,您如何确定它属于哪个程序?

4

7 回答 7

19
  • Are the binary values of the file trying to be converted into ASCII?

Yes, that's exactly what's happening. Typically, the binary values of the file also include ASCII control characters that aren't printable, resulting in even more bizarre display in a typical text editor.

  • Is it possible to convert the view to display raw binary values, i.e. to show the 1s and 0s that make up the file?

It depends on your editor. What you want is a "hex editor", rather than a normal text editor. This will show you the raw contents of the file (typically in hexadecimal rather than binary, since the zeros and ones would take up a lot of space and be harder to read).

  • Finally, is there a way to determine what program will properly open a data file?

There is a Linux command-line program called "file" that will attempt to analyze the file (typically looking for common header patterns) and tell you what sort of file it is (for example text, or audio, or video, or XML, etc). I'm not sure if there is an equivalent program for Windows. Of course, the output of this program is just a guess, but it can be very useful when you don't know what the format of a file is.

于 2008-10-19T05:57:42.730 回答
5

A binary file appears as gibberish because the data in it is designed for the machine to read and not for humans. Sadly, some of us get used to interpreting gibberish - albeit with somewhat specialized tools to help see the data better - but most people should not need to know.

Each byte in the file is treated as a character in the current code set (probably CP1252 on Windows). Byte value 65 is 'A', for example; you can find illustrative examples easily on the web. So, the bytes that make up the binary data are displayed according to the code set - as best as the text editor can. It doesn't try to convert the binary - it doesn't know how (only the original program does).

As to how to detect what program created the file - you may be able to do that sometimes, but not easily and reliably. On Unix (or with Cygwin on Windows) the 'file' program may be able to help. This program looks at the first few bytes to try and guess the program.

Encrypted data is supposed to look like gibberish. If it doesn't look like gibberish, then it probably isn't very well encrypted.

于 2008-10-19T05:58:59.670 回答
3

显示看起来很有趣,因为二进制文件可以包含不可打印的字符。由显示程序用其他字符替换这些字符。

这可以通过使用十六进制编辑器来防止。这样的程序将文件中的每个字节显示为其十六进制值。这为文件提供了一个很好的表格视图,但是对于普通人来说,破译这个视图并不容易,因为我们不习惯以这种方式查看数据。

有几种方法可以找出文件可能属于哪个程序。您可以查看文件的开头,并且通过一些知识,您可能会识别文件类型。有些类型以相同的字符开头(RAR、GIF 等)。对于其他类型,它可能不那么容易。

在 Linux 中,您可以使用“文件”命令来帮助您确定文件类型。可能有适用于 Windows 的程序可以做到这一点。

于 2008-10-19T05:53:44.610 回答
2

The reason files that are binary display as gibberish when viewed in standard text editors such as notepad is because when displayed with the encodings commonly used by these types of applications (e.g. ASCII of UTF-8) the data is mapped to characters when it is encoded for display, the output of this process generally makes as little sense to humans as the binary data being mapped, ergo the gibberish you see

As previously mentioned these files make more sense when viewed in a different way such as with a hex editor.

Certain file types can be recognized by data present in all files of a given type, for example all executable files (*.exe) begin with the letters MZ

于 2008-10-19T05:59:39.983 回答
2

Binary data is often very random. Encrypted data in particular, by definition. Each byte can be represented by one of 256 characters (leaving Unicode out of the equation). ASCII only covers 128 of these, and only 94 of these are actual printable characters. Outside the ASCII range, you have a number of international characters and strange symbols. There are certainly more than 128 of these, so one must specify a codepage to select a specific set of symbols.

Anyway, since binary files can be represented as a very random assortment of familiar and unfamiliar characters, the file will look like gibberish if you open it in an editor.

You could always open a file (binary or text file, there really is no difference) in a hex editor, and look at the raw binary data.

There is no way to tell which program created a specific file. In particular, if the program has encrypted its data, all hope is lost. Otherwise, it is often easy to recognize certain "signatures."

于 2008-10-19T06:04:06.930 回答
0

是的,写字板和记事本以及许多其他文本编辑器假定您使用它打开的任何文件都是文本文件,并且会尝试显示文件中字节表示的 ASCII 字符。

十六进制编辑器用于查看和编辑二进制文件。它们通常将每个字节显示为一对十六进制数字,而不是“1 和 0”,因为这样更容易阅读。

于 2008-10-19T05:54:49.400 回答
0

除了字符编码之类的东西之外,文本编辑器对进入其中的数据做出的假设很少。因此,它将(如您所说)将文件数据读取为 ASCII 并以这种方式显示。由于二进制数据并不总是在字母数字范围内,因此您会胡言乱语。至于显示原始二进制值,您需要像XVI32这样的十六进制编辑器。

Binary files often have no context outside of the program that uses them. Some binary formats contain a 4-byte magic sequence at the beginning (for example, Java .class files start with "CAFE"), but to recognize them without their program, you need a mapping of those 4-byte sequences. I believe some Linux distros contain this information for a wide variety of binary formats and will examine the beginning of the file to attempt to identify it. Other than that, there's not much you can do.

于 2008-10-19T05:56:18.163 回答