2

假设我有两个文件,(名称)。n.rar 和(名称)。n+1.rar,似乎是同一集合的一部分(相同大小等)。有没有简单的方法来判断它们是否实际上是同一集合的一部分,而无需先下载完整集合?目前我能判断的唯一方法是下载每个文件的实例,然后查看 WinRAR 在我尝试解包时是否给我一个错误。

(并且在相关说明中,假设有这样的方法,我可以在没有相邻部分的情况下做同样的事情吗?)

理想情况下,有一个现有的程序可以做到这一点,但如有必要,我可以自己编写代码。

进一步说明:这是同一文件的两组档案。它们看起来与明显的检查相同:文件名是后续的,内容是合理的,大小相同,部分数量相同。然后我会收到一整套文件。如果它们不是来自同一组,我无法解压缩它们 - 尽管 WinRAR 似乎会在给我 CRC 错误(文件损坏)之前进入 100%。

4

2 回答 2

1

New Answer

All tests were made using WinRAR 5.01 32-bit. Since the algorythm should remain the same, the following statements should be valid for any other previous version. Feel free to comment if you know that's not true.

I'll give a short briefing about the chat. I tried to pack a file larger than 1GB several times; Then I mixed up the files and tried to extract the archives: it worked. The problem was not the size of the file indeed.

I thought about three possible solutions to the problem:

  1. Architecture was influent in the packaging process: so different people tried to pack the files, and mixing up them would result in an error;
  2. Different people tried to pack the files, giving a slightly different size file (for example 250 MB and 250000 KB). This would have been noticed in the file properties, though;
  3. Files were corrupted during the download: re-downloading them would confirm this hypothesis.

I was most curious about the first one: could architecture be influent in the packaging process?

I found out the answer is yes, it is. Here are the passages to repeat the experiment:

  1. Pack your files in an archive, giving a precise part size, in computer A;
  2. Pack the same exact files, giving the same exact part size, in computer B (TODO: Check if this experiment is still valid with similar architecture, e.g. Intel i7 with Intel i5) with a different architecture (e.g. Intel processor with AMD processor);
  3. Transfer one (or more, if you wish, but of course not all of them!) parts from computer B to computer A. Remember to delete those files from computer A before the transfer;
  4. Place all the files in the same directory, check if they all have the same name (e.g. "AAA part1", "AAA part2"...);
  5. Extract them;
  6. Enjoy your CRC Error!.

Tests were made using an Intel i7-3632QM and an AMD FX 6300.

I have some suspects about the fact that the compressed files are the same, but the CRC code is different.


Old Answer

There is a way indeed. During my Computer Science academic studies, we had a Computer Forensics class. I learned that every file has a static beginning (an header, we could say), that makes a program recognize its type and the way to decrypt it. To see it, you just have to open it with a text editor (Notepad++ is the best so far, I guess)

For example, jpeg images begin with ÿØÿá.

I tried to store a video in some splitted .rar files, and knowing if they are part of the same archive was simpler than I thought.

Every rar file begins with Rar!. On the second or third line, it should appear the name of the file stored in the archive: in my case, myVideo.mp4. If all your archives contain that filename, they're probably part of the same archive.

Things are getting worse if there are several files in the archive and you don't know their names. In fact, if there is more than one file, the RAR files structure is as follows:

File 1:

Rar!
NUL NUL NUL //Random things here
NUL NUL NUL NUL NUL myVideo.mp4 NUL NUL NUL NUL
//Random things here. If the dimensions of the file exceed the archive,
//the next file will begin with the same name.
//Let's assume that this is happening.
EOF

File 2:

Rar!
NUL NUL NUL //Random things here
NUL NUL myVideo.mp4 NUL NUL NUL
//This time the file is complete. Since there is still space in the archive,
//it will add another file
NUL NUL NUL NUL mySecondVideo.mp4 NUL NUL NUL NUL
EOF

Let's assume that at the end of the second archive, mySecondVideo hasn't been fully compressed yet.

File 3:

Rar!
NUL NUL NUL
NUL NUL NUL NUL mySecondVideo.mp4 NUL
NUL NUL NUL
NUL myTextFile.txt
NUL NUL NUL mySecondTextFile.txt NUL
EOF

If mySecondTextFile.txt isn't yet fully compressed, my fourth file will begin with its name.

I hope it's clear, I tried to keep it as simple as possible. In the case of more files, I would start from the last archive. I'd write down the first filename found on that file and I'd search it in the previous one. If I found that name, I'd repeat the sequence until the first archive.

于 2013-12-17T08:54:46.307 回答
1

我不太熟悉 RAR 格式,但如果您决定用 Java 编写程序,我可以推荐使用 7-Zip-JBinding。

您可以下载存档的前 n+1 部分,然后调用extract()忽略输出数据的方法只关心

IArchiveExtractCallback.setOperationResult(ExtractOperationResult) 

调用(检查 CRC 是否正常)并监控文件是否打开

IArchiveOpenVolumeCallback.getStream(java.lang.String)

如果请求第 n+2 卷,您可以断定第 n+1 卷是正确的。(我不是 100% 确定这个结论,但我会试一试)

于 2013-12-17T08:22:23.337 回答