0

我在 Java 中美化/缩进一些 XML:

<div xml:space="default"><h1 xml:space="default">Indenting mixed content in Java</h1><p xml:space="preserve">Why does indenting mixed content (like this paragraph) add whitespace around <a href="http://www.stackoverflow.com" xml:space="preserve"><strong>this strong element</strong></a>?</p></div>

当我美化 XML 时,我不希望将空格添加到<a>元素的内容中,因此我指定xml:space="preserve"期望转换器保留其中的空格。

但是,当我转换 XML 时,我得到了这个:

<div>
    <h1 xml:space="default">Indenting mixed content in Java</h1>
    <p>Why does indenting mixed content (like this paragraph) add whitespace around <a href="http://www.stackoverflow.com">
            <strong xml:space="preserve">this strong element</strong>
        </a>?</p>
</div>

...在<a><strong>元素之间有额外的空格。(不仅如此,</a>关闭标签笨拙地与它的打开标签不对齐。)

如何防止美化者添加该空白?难道我做错了什么?这是我正在使用的 Java 代码:

import org.w3c.dom.Element;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import java.io.ByteArrayInputStream;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Transformer;
import java.io.StringWriter;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.stream.StreamResult;

public class XmlExample {

    public static void main(String[] argv) {
        Document xmlDoc    = parseXml("<div xml:space=\"default\">" + 
                                          "<h1 xml:space=\"default\">Indenting mixed content in Java</h1>" + 
                                          "<p xml:space=\"preserve\">Why does indenting mixed content (like this paragraph) add whitespace around " + 
                                              "<a href=\"http://www.stackoverflow.com\" xml:space=\"preserve\"><strong>this strong element</strong></a>?" + 
                                          "</p>" + 
                                      "</div>");
        String   xmlString = xmlToString(xmlDoc.getDocumentElement());
        System.out.println(xmlString);
    }

    public static Document parseXml(String xml) {
        try {
            DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
            docFactory.setNamespaceAware(true);
            DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

            Document doc = docBuilder.parse(new ByteArrayInputStream(xml.getBytes("UTF-8"))); 
            return doc;
        }
        catch(Exception e) {
            throw new RuntimeException(e);
        }
    }

    public static String xmlToString(Element el) {
        try {
            TransformerFactory tf = TransformerFactory.newInstance();
            Transformer transformer = tf.newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            StringWriter writer = new StringWriter();
            DOMSource source = new DOMSource(el);
            transformer.transform(source, new StreamResult(writer));
            return writer.getBuffer().toString().trim();
        }
        catch(Exception e) {
            throw new RuntimeException(e);
        }
    }

}
4

1 回答 1

2

如果您使用符合 XSLT 1.0 或 XSLT 2.0 规范的序列化程序,那么它应该尊重 xml:space(即,在 xml:space="preserve" 的范围内,应该禁止缩进)。XSLT 2.0 规范在这一点上比 XSLT 1.0 更明确,并使其成为“必须”而不是“应该”要求。

您使用的是 JAXP 身份转换而不是 XSLT 转换;从 JAXP 规范到 XSLT 1.0 规范有一个参考,但它有点模糊。

如果你使用 Saxon,你应该得到想要的行为。Saxon 还允许您使用 SUPPRESS_INDENTATION 输出参数抑制特定元素的缩进,因此您甚至不必在被序列化的文档中包含 xml:space。

于 2013-06-12T08:17:41.357 回答