2

我正在编写一个 XML 文件,并且选项卡出现了一些错误:

<BusinessEvents>

<MailEvent>
          <to>Wellington</to>
          <weight>10.0</weight>
          <priority>air priority</priority>
          <volume>10.0</volume>
          <from>Christchurch</from>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <PPW>8.0</PPW>
          <PPV>2.5</PPV>
     </MailEvent>
<DiscontinueEvent>
          <to>Wellington</to>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <from>Sydney</from>
     </DiscontinueEvent>
<RoutePriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <duration>15.0</duration>
          <maxweight>40.0</maxweight>
          <maxvolume>20.0</maxvolume>
          <priority>air priority</priority>
          <company>Kiwi Co</company>
          <day>Mon May 20 14:30:08 NZST 2013</day>
          <frequency>3.0</frequency>
          <from>Wellington</from>
          <volumecost>2.0</volumecost>
     </RoutePriceUpdateEvent>
<CustomerPriceUpdateEvent>
          <weightcost>3.0</weightcost>
          <to>Wellington</to>
          <priority>air priority</priority>
          <from>Sydney</from>
          <volumecost>2.0</volumecost>
     </CustomerPriceUpdateEvent>
</BusinessEvents>

如您所见,第一个子节点根本没有缩进,但是那个节点子节点缩进了两次?然后关闭标签只缩进一次?

我怀疑这可能与通过将根添加到文档中有关doc.appendChild(root),但是当我这样做时,我得到一个错误

“试图在不允许的地方插入一个节点。”

这是我的解析器:

DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder icBuilder;
        try {
            icBuilder = icFactory.newDocumentBuilder();
            String businessEventsFile = System.getProperty("user.dir") + "/testdata/businessevents/businessevents.xml";
            Document doc = icBuilder.parse (businessEventsFile);

            Element root = doc.getDocumentElement();

            Element element;

            if(event instanceof CustomerPriceUpdateEvent){
                element = doc.createElement("CustomerPriceUpdateEvent");
            }
            else if(event instanceof DiscontinueEvent){
                element = doc.createElement("DiscontinueEvent");
            }
            else if(event instanceof MailEvent){
                element = doc.createElement("MailEvent");
            }
            else if(event instanceof RoutePriceUpdateEvent){
                element = doc.createElement("RoutePriceUpdateEvent");
            }
            else{
                throw new Exception("business event isnt valid");
            }

            for(Map.Entry<String, String> field : event.getFields().entrySet()){
                Element newElement = doc.createElement(field.getKey());
                newElement.appendChild(doc.createTextNode(field.getValue()));
                element.appendChild(newElement);
            }

            root.appendChild(element);


            // output DOM XML to console
            Transformer transformer = TransformerFactory.newInstance().newTransformer();
//            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "5");
            DOMSource source = new DOMSource(doc);
            StreamResult console = new StreamResult(businessEventsFile);
            transformer.transform(source, console);

任何见解将不胜感激。

4

1 回答 1

8

不久前我遇到了同样的问题。我发现问题在于解析的文档在整个文档中都包含空格作为文本节点。

例如,解析文档后,您可能在<MailEvent>节点下的节点之前有一个空白文本<BusinessEvents>节点。Transformer 保留空白文本节点(我认为这是正确的行为)。

因此,如果 xml 文本中的标签之间根本没有空格,则 Transformer 会正确缩进标签。您可以通过手动删除输入中的所有空格(包括换行符)来尝试使用您的代码,然后进行格式化。那么输出可能会比你所期望的更多。

解决此问题的一种方法是在文档被解析后从文档中删除多余的空格。简单地删除所有空白文本节点将使格式看起来更好,但问题是是否确实需要一些空白文本节点。

所以我在格式化之前清理文档的方法是删除所有只包含空格的文本节点,除了文本节点是唯一的孩子(没有兄弟姐妹)的情况。

下面的方法cleanEmptyTextNodes(Node parentNode)递归地从子树中删除所有空白文本节点。

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.StringWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;

public class FormatXml {

    public static void main(String[] args) throws ParserConfigurationException,
            FileNotFoundException, SAXException, IOException,
            TransformerException {
        DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
                .newInstance();
        DocumentBuilder documentBuilder = docBuilderFactory
                .newDocumentBuilder();
        Document node = documentBuilder.parse(new FileInputStream("data.xml"));
        System.out.println(format(node, 4));
    }

    public static String format(Node node, int indent)
            throws TransformerException {
        cleanEmptyTextNodes(node);
        StreamResult result = new StreamResult(new StringWriter());
        getTransformer(indent).transform(new DOMSource(node), result);
        return result.getWriter().toString();
    }

    private static Transformer getTransformer(int indent) {
        Transformer transformer;
        try {
            transformer = TransformerFactory.newInstance().newTransformer();
        } catch (Exception e) {
            throw new RuntimeException("Failed to create the Transformer", e);
        }
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(
                "{http://xml.apache.org/xslt}indent-amount",
                Integer.toString(indent));
        return transformer;
    }

    /**
     * Removes text nodes that only contains whitespace. The conditions for
     * removing text nodes, besides only containing whitespace, are: If the
     * parent node has at least one child of any of the following types, all
     * whitespace-only text-node children will be removed: - ELEMENT child -
     * CDATA child - COMMENT child
     * 
     * The purpose of this is to make the format() method (that use a
     * Transformer for formatting) more consistent regarding indenting and line
     * breaks.
     */
    private static void cleanEmptyTextNodes(Node parentNode) {
        boolean removeEmptyTextNodes = false;
        Node childNode = parentNode.getFirstChild();
        while (childNode != null) {
            removeEmptyTextNodes |= checkNodeTypes(childNode);
            childNode = childNode.getNextSibling();
        }

        if (removeEmptyTextNodes) {
            removeEmptyTextNodes(parentNode);
        }
    }

    private static void removeEmptyTextNodes(Node parentNode) {
        Node childNode = parentNode.getFirstChild();
        while (childNode != null) {
            // grab the "nextSibling" before the child node is removed
            Node nextChild = childNode.getNextSibling();

            short nodeType = childNode.getNodeType();
            if (nodeType == Node.TEXT_NODE) {
                boolean containsOnlyWhitespace = childNode.getNodeValue()
                        .trim().isEmpty();
                if (containsOnlyWhitespace) {
                    parentNode.removeChild(childNode);
                }
            }
            childNode = nextChild;
        }
    }

    private static boolean checkNodeTypes(Node childNode) {
        short nodeType = childNode.getNodeType();

        if (nodeType == Node.ELEMENT_NODE) {
            cleanEmptyTextNodes(childNode); // recurse into subtree
        }

        if (nodeType == Node.ELEMENT_NODE
                || nodeType == Node.CDATA_SECTION_NODE
                || nodeType == Node.COMMENT_NODE) {
            return true;
        } else {
            return false;
        }
    }

}

使用您的输入生成的格式化输出:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<BusinessEvents>
    <MailEvent>
        <to>Wellington</to>
        <weight>10.0</weight>
        <priority>air priority</priority>
        <volume>10.0</volume>
        <from>Christchurch</from>
        <day>Mon May 20 14:30:08 NZST 2013</day>
        <PPW>8.0</PPW>
        <PPV>2.5</PPV>
    </MailEvent>
    <DiscontinueEvent>
        <to>Wellington</to>
        <priority>air priority</priority>
        <company>Kiwi Co</company>
        <from>Sydney</from>
    </DiscontinueEvent>
    <RoutePriceUpdateEvent>
        <weightcost>3.0</weightcost>
        <to>Wellington</to>
        <duration>15.0</duration>
        <maxweight>40.0</maxweight>
        <maxvolume>20.0</maxvolume>
        <priority>air priority</priority>
        <company>Kiwi Co</company>
        <day>Mon May 20 14:30:08 NZST 2013</day>
        <frequency>3.0</frequency>
        <from>Wellington</from>
        <volumecost>2.0</volumecost>
    </RoutePriceUpdateEvent>
    <CustomerPriceUpdateEvent>
        <weightcost>3.0</weightcost>
        <to>Wellington</to>
        <priority>air priority</priority>
        <from>Sydney</from>
        <volumecost>2.0</volumecost>
    </CustomerPriceUpdateEvent>
</BusinessEvents>
于 2013-05-20T14:01:04.793 回答