java - 检测流是否用 Java 压缩的最佳方法

Question

找出 ijava.io.InputStream包含压缩数据的最佳方法是什么？

score 44 · Accepted Answer

介绍

由于所有的答案都是 5 岁，我觉得有责任写下今天发生的事情。我严重怀疑是否应该读取流的魔法字节！这是一个低级代码，一般应避免使用。

简单的答案

米库写道：

如果 Stream 可以通过 ZipInputStream 读取，则应该对其进行压缩。

是的，但是在ZipInputStream“可以读取”的情况下，意味着第一次调用.getNextEntry()返回一个非空值。捕捉等等也不例外。因此，您可以执行以下操作，而不是魔术字节解析：

boolean isZipped = new ZipInputStream(yourInputStream).getNextEntry() != null;

就是这样！

一般解压思路

一般来说，似乎在 [un] 压缩时处理文件比使用流更方便。有几个有用的库，加上 ZipFile 比 ZipInputStream 有更多的功能。此处讨论了 zip 文件的处理：什么是压缩/解压缩文件的好 Java 库？因此，如果您可以使用文件，则最好这样做！

代码示例

我需要在我的应用程序中只使用流。这就是我写的解压方法：

import org.apache.commons.io.IOUtils;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public boolean unzip(InputStream inputStream, File outputFolder) throws IOException {

    ZipInputStream zis = new ZipInputStream(inputStream);

    ZipEntry entry;
    boolean isEmpty = true;
    while ((entry = zis.getNextEntry()) != null) {
        isEmpty = false;
        File newFile = new File(outputFolder, entry.getName());
        if (newFile.getParentFile().mkdirs() && !entry.isDirectory()) {
            FileOutputStream fos = new FileOutputStream(newFile);
            IOUtils.copy(zis, fos);
            IOUtils.closeQuietly(fos);
        }
    }

    IOUtils.closeQuietly(zis);
    return !isEmpty;
}

score 23 · Accepted Answer

ZIP 格式的魔术字节是50 4B. 您可以测试流（使用mark和reset - 您可能需要缓冲），但我不认为这是 100% 可靠的方法。没有办法将它与以字母开头的 US-ASCII 编码文本文件区分开来PK。

最好的方法是在打开流之前提供有关内容格式的元数据，然后对其进行适当处理。

score 6 · Accepted Answer

不是很优雅，但可靠：

如果 Stream 可以通过读取ZipInputStream，则应该对其进行压缩。

score 6 · Accepted Answer

您可以检查流的前四个字节是否是本地文件头签名，该签名启动本地文件头，该文件头处理 ZIP 文件中的每个文件，如此处的规范所示为50 4B 03 04.

一个小测试代码表明它可以工作：

byte[] buffer = new byte[4];

try {
    ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("so.zip"));
    ZipEntry ze = new ZipEntry("HelloWorld.txt");
    zos.putNextEntry(ze);
    zos.write("Hello world".getBytes());
    zos.close();

    FileInputStream is = new FileInputStream("so.zip");
    is.read(buffer);
    is.close();
}
catch(IOException e) {
    e.printStackTrace();
}

for (byte b : buffer) { 
    System.out.printf("%H ",b);
}

给了我这个输出：

50 4B 3 4

score 0 · Accepted Answer

我将@McDowell 和@Innokenty 的答案结合到一个小的lib 函数中，您可以将其粘贴到您的项目中：

public static boolean isZipStream(InputStream inputStream) {
    if (inputStream == null || !inputStream.markSupported()) {
        throw new IllegalArgumentException("InputStream must support mark-reset. Use BufferedInputstream()");
    }
    boolean isZipped = false;
    try {
        inputStream.mark(2048);
        isZipped = new ZipInputStream(inputStream).getNextEntry() != null;
        inputStream.reset();
    } catch (IOException ex) {
        // cannot be opend as zip.
    }
    return isZipped;
}

您可以像这样使用库：

public static void main(String[] args) {
    InputStream inputStream = new BufferedInputStream(...);

    if (isZipStream(inputStream)) {
        // do zip processing using inputStream
    } else {
        // do non-zip processing using inputStream
    }

}

score 0 · Accepted Answer

由于 .zip 和 .xlsx 具有相同的幻数，我找不到有效的 zip 文件（如果重命名）。

因此，我使用 Apache Tika 来查找确切的文档类型。

即使将文件类型重命名为 zip，它也会找到确切的类型。

参考：https ://www.baeldung.com/apache-tika

score 0 · Accepted Answer

0

检查幻数可能不是正确的选择。

Docx 文件也有类似的幻数 50 4B 3 4

于 2015-11-12T05:49:58.573 回答

java - 检测流是否用 Java 压缩的最佳方法

7 回答 7

Related

Reference