java - 如何在编写 XML 文件时忽略 DTD 验证但保留 Doctype？

Question

我正在开发一个系统，该系统应该能够读取任何（或至少任何格式良好的）XML 文件，操作一些节点并将它们写回到同一个文件中。我希望我的代码尽可能通用，我不希望

在我的代码中的任何位置对 Schema/Doctype 信息的硬编码引用。doctype 信息在源文档中，我想准确地保留该 doctype 信息，而不是在我的代码中再次提供它。如果一个文档没有 DocType，我不会添加一个。我根本不关心这些文件的形式或内容，除了我的几个节点。
自定义 EntityResolvers 或 StreamFilters 以省略或以其他方式操作源信息（很遗憾，命名空间信息似乎无法从声明它的文档文件中访问，但我可以通过使用更丑的 XPaths 来管理）
DTD 验证。我没有引用的 DTD，我不想包含它们，并且在不知道它们的情况下完全可以进行节点操作。

目的是使源文件完全不变，除了通过 XPath 检索的已更改节点。我想摆脱标准的 javax.xml 东西。

到目前为止我的进展：

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    factory.setAttribute("http://xml.org/sax/features/namespaces", true);
    factory.setAttribute("http://xml.org/sax/features/validation", false);
    factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
    factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

    factory.setNamespaceAware(true);
    factory.setIgnoringElementContentWhitespace(false);
    factory.setIgnoringComments(false);
    factory.setValidating(false);
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse(new InputSource(inStream));

这会成功地将 XML 源加载到 org.w3c.dom.Document 中，而忽略 DTD 验证。我可以做我的替换，然后我使用

    Source source = new DOMSource(document);
    Result result = new StreamResult(getOutputStream(getPath()));

    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    xformer.transform(source, result);

把它写回来。这几乎是完美的。但是无论我做什么，Doctype 标签都消失了。在调试的时候，看到解析后的Document对象中有一个DeferredDoctypeImpl [log4j:configuration: null]对象，但是不知怎么的，它是错误的，为空或被忽略。我测试的文件是这样开始的（但其他文件类型也是如此）：

<?xml 版本="1.0" 编码="UTF-8"?>

<!DOCTYPE log4j:配置系统“log4j.dtd”>

<log4j:配置 xmlns:log4j="http://jakarta.apache.org/log4j/" debug="false">

[...]

我认为有很多（简单的？）方法涉及黑客或将额外的 JAR 拉入项目。但我更愿意将它与我已经使用的工具一起使用。

score 1 · Accepted Answer

1

抱歉，现在使用 XMLSerializer 而不是 Transformer 得到它...

于 2009-02-24T16:29:03.920 回答

score 0 · Accepted Answer

以下是使用 JDK 中的 LSSerializer 的方法：

    private void writeDocument(Document doc, String filename)
            throws IOException {
        Writer writer = null;
        try {
            /*
             * Could extract "ls" to an instance attribute, so it can be reused.
             */
            DOMImplementationLS ls = (DOMImplementationLS) 
                    DOMImplementationRegistry.newInstance().
                            getDOMImplementation("LS");
            writer = new OutputStreamWriter(new FileOutputStream(filename));
            LSOutput lsout = ls.createLSOutput();
            lsout.setCharacterStream(writer);
            /*
             * If "doc" has been constructed by parsing an XML document, we
             * should keep its encoding when serializing it; if it has been
             * constructed in memory, its encoding has to be decided by the
             * client code.
             */
            lsout.setEncoding(doc.getXmlEncoding());
            LSSerializer serializer = ls.createLSSerializer();
            serializer.write(doc, lsout);
        } catch (Exception e) {
            throw new IOException(e);
        } finally {
            if (writer != null) writer.close();
        }
    }

需要进口：

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.w3c.dom.Document;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;

我知道这是一个已经回答的老问题，但我认为技术细节可能会对某人有所帮助。

score 0 · Accepted Answer

我尝试使用 LSSerializer 库，但在保留 Doctype 方面无法使用它。这是 Stephan 可能使用的解决方案 注意：这是在 scala 中，但使用 java 库，所以只需转换您的代码

import com.sun.org.apache.xml.internal.serialize.{OutputFormat, XMLSerializer}
 def transformXML(root: Element, file: String): Unit = {
    val doc = root.getOwnerDocument
    val format = new OutputFormat(doc)
    format.setIndenting(true)
    val writer = new OutputStreamWriter(new FileOutputStream(new File(file)))
    val serializer = new XMLSerializer(writer, format)
    serializer.serialize(doc)

  }

java - 如何在编写 XML 文件时忽略 DTD 验证但保留 Doctype？

3 回答 3

Related

Reference