3

我必须分析 zip 文件以检查其中的内容有多大,但是 ZipEntry.getSize() 不断返回-1。如果原始大小未知,则这是根据规范,但由于某种原因,7-zip 似乎确实知道实际大小,因为如果我用它打开 zip 则会显示它。

有谁知道7-zip是怎么做到的?它只是估计吗?

4

2 回答 2

4

也许 ZipEntry 只处理本地文件头,而不是在压缩完成后写入 zip 存档末尾的中央目录,并且应该包含实际文件大小信息。

于 2013-07-05T11:42:52.303 回答
0

对于那些感兴趣的人,这里是我用来解析 zip 的代码(请记住 zip 是小端序)。我使用维基百科(http://en.wikipedia.org/wiki/ZIP_%28file_format%29)作为结构的参考。

public static List<ZipCentralFileHeader> getCentralDirectory(File file) throws IOException {
    List<ZipCentralFileHeader> entries = new ArrayList<ZipCentralFileHeader>();
    FileInputStream input = new FileInputStream(file);
    try {
        // only check the last 10 meg, make sure this is large enough depending on your data
        long sizeToSkip = Math.max(0, file.length() - (1024 * 1024 * 10));
        if (sizeToSkip > 0)
            input.skip(sizeToSkip);
        byte [] buffer = new byte[(int) (file.length() - sizeToSkip)];
        int read = input.read(buffer);
        if (read != buffer.length)
            throw new IOException("Could not read the necessary data");
        for (int i = 0; i < buffer.length - 4; i++) {
            if (buffer[i] == 0x50 && buffer[i + 1] == 0x4b && buffer[i + 2] == 0x01 && buffer[i + 3] == 0x02) {
                Date lastModified = dosToJavaTime(get32(buffer, i + 12));
                long compressedSize = get32(buffer, i + 20);
                long uncompressedSize = get32(buffer, i + 24);
                int nameLength = get16(buffer, i + 28);
                int extraFieldLength = get16(buffer, i + 30);
                int commentLength = get16(buffer, i + 32);

                String fileName = new String(Arrays.copyOfRange(buffer, i + 46, i + 46 + nameLength), "UTF-8");
                String comment = new String(Arrays.copyOfRange(buffer, i + 46 + nameLength + extraFieldLength, i + 46 + nameLength + extraFieldLength + commentLength), "UTF-8");

                entries.add(new ZipCentralFileHeader(fileName, lastModified, compressedSize, uncompressedSize, comment));
            }
            // the end of the central directory
            else if (buffer[i] == 0x50 && buffer[i + 1] == 0x4b && buffer[i + 2] == 0x05 && buffer[i + 3] == 0x06) { //0x06054b50
                // each header starts the same, there is no general start sequence for the entire central directory
                // as such you can't really be sure you got them all unless you scan the entire file
                // the trailing section however contains the necessary information to validate the amount
                int amountOfFileHeaders = get16(buffer, i + 8);
                if (amountOfFileHeaders != entries.size())
                    throw new IOException("Could only read " + entries.size() + "/" + amountOfFileHeaders + " headers for " + file + ", you likely did not read enough of the file");
                break;
            }
        }
    }
    finally {
        input.close();
    }
    return entries;
}

实用方法 get16、get32、get64 和 dosToJavaTime 是基于 jdk 7 快照的现有 ZipEntry 代码的副本:

private static final int get16(byte b[], int off) {
    return (b[off] & 0xff) | ((b[off+1] & 0xff) << 8);
}

private static final long get32(byte b[], int off) {
    return (get16(b, off) | ((long)get16(b, off+2) << 16)) & 0xffffffffL;
}

private static final long get64(byte b[], int off) {
    return get32(b, off) | (get32(b, off+4) << 32);
}

@SuppressWarnings("deprecation")
private static Date dosToJavaTime(long dtime) {
    Date date = new Date((int)(((dtime >> 25) & 0x7f) + 80),
                      (int)(((dtime >> 21) & 0x0f) - 1),
                      (int)((dtime >> 16) & 0x1f),
                      (int)((dtime >> 11) & 0x1f),
                      (int)((dtime >> 5) & 0x3f),
                      (int)((dtime << 1) & 0x3e));
    return date;
}
于 2013-07-08T08:51:06.330 回答