java - 使用 dom4j 处理压缩的 XML 文档

Question

具体来说，我使用 dom4j 读取 KML 文档并解析出 XML 中的一些数据。当我只是将字符串形式的 URL 传递给阅读器时，它非常简单并且可以同时处理文件系统 URL 和 Web URL：

SAXReader reader = new SAXReader();
Document document = reader.read(url);

问题是，有时我的代码需要处理 KMZ 文档，这些文档基本上只是压缩的 XML (KML) 文档。不幸的是，使用 SAXReader 没有方便的方法来处理这个问题。我找到了各种时髦的解决方案来确定任何给定的文件是否是 ZIP 文件，但是我的代码很快就变得很糟糕——读取流，构建文件，检查开头的“神奇”十六进制字节，提取等

有没有一些快速和干净的方法来处理这个？连接到任何 URL 并在压缩内容时提取内容的更简单方法，否则只需获取 XML？

score 0 · Accepted Answer

嗯，KMZDOMLoader 似乎无法处理 web 上的 kmz 文件。kmz 可能是动态加载的，因此它并不总是具有 a) 文件引用或 b) 特定的 .kmz 扩展名——它必须通过内容类型来确定。

我最终做的是构建一个 URL 对象，然后获取协议。我有单独的逻辑来处理本地文件或网络上的文档。然后在每个逻辑块中，我必须确定它是否被压缩。SAXReader read() 方法采用输入流，因此我发现可以为 kmzs 使用 ZipInputStream。

这是我最终得到的代码：

private static final long ZIP_MAGIC_NUMBERS = 0x504B0304;
private static final String KMZ_CONTENT_TYPE = "application/vnd.google-earth.kmz";

private Document getDocument(String urlString) throws IOException, DocumentException, URISyntaxException {
        InputStream inputStream = null;
        URL url = new URL(urlString);
        String protocol = url.getProtocol();

        /*
         * Figure out how to get the XML from the URL -- there are 4 possibilities:
         * 
         * 1)  a KML (uncompressed) doc on the filesystem
         * 2)  a KMZ (compressed) doc on the filesystem
         * 3)  a KML (uncompressed) doc on the web
         * 4)  a KMZ (compressed) doc on the web
         */
        if (protocol.equalsIgnoreCase("file")) {
            // the provided input URL points to a file on a file system
            File file = new File(url.toURI());
            RandomAccessFile raf = new RandomAccessFile(file, "r");
            long n = raf.readInt();
            raf.close();

            if (n == KmlMetadataExtractorAdaptor.ZIP_MAGIC_NUMBERS) {
                // the file is a KMZ file
                inputStream = new ZipInputStream(new FileInputStream(file));
                ((ZipInputStream) inputStream).getNextEntry();
            } else {
                // the file is a KML file
                inputStream = new FileInputStream(file);
            }

        } else if (protocol.equalsIgnoreCase("http") || protocol.equalsIgnoreCase("https")) {
            // the provided input URL points to a web location
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.connect();

            String contentType = connection.getContentType();

            if (contentType.contains(KmlMetadataExtractorAdaptor.KMZ_CONTENT_TYPE)) {
                // the target resource is KMZ
                inputStream = new ZipInputStream(connection.getInputStream());
                ((ZipInputStream) inputStream).getNextEntry();
            } else {
                // the target resource is KML
                inputStream = connection.getInputStream();
            }

        }

        Document document = new SAXReader().read(inputStream);
        inputStream.close();

        return document;
    }

java - 使用 dom4j 处理压缩的 XML 文档

1 回答 1

Related

Reference