java - 仅提取所有 SOAP XML 节点文本，使用 Java

Question

我有以下 SOAP XML，我想从中提取所有节点的文本内容：

<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"
    xmlns:m="http://www.example.org/stock">
    <soap:Body>
        <m:GetStockName>
            <m:StockName>ABC</m:StockName>
        </m:GetStockName>
        <!--some comment-->
        <m:GetStockPrice>
            <m:StockPrice>10 \n </m:StockPrice>
            <m:StockPrice>\t20</m:StockPrice>
        </m:GetStockPrice>
    </soap:Body>
</soap:Envelope>

预期的输出将是：

'ABC10 \n \t20'

我在DOM中完成了以下操作：

public static String parseXmlDom() throws ParserConfigurationException,
        SAXException, IOException, FileNotFoundException {

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    // Read XML File
    String xml = IOUtils.toString(new FileInputStream(new File(
            "./files/request2.xml")), "UTF-8");
    InputSource is = new InputSource(new StringReader(xml));
    // Parse XML String to DOM
    factory.setNamespaceAware(true);
    factory.setIgnoringComments(true);
    Document doc = builder.parse(is);
    // Extract nodes text
    NodeList nodeList = doc.getElementsByTagNameNS("*", "*");
    Node node = nodeList.item(0);
    return node.getTextContent();
}

并使用SAX：

public static String parseXmlSax() throws SAXException, IOException, ParserConfigurationException {

    final StringBuffer sb = new StringBuffer();
    SAXParserFactory factory = SAXParserFactory.newInstance();
    SAXParser saxParser = factory.newSAXParser();
    // Declare Handler
    DefaultHandler handler = new DefaultHandler() {
        public void characters(char ch[], int start, int length) throws SAXException {
            sb.append((new String(ch, start, length)));
        }
    };
    // Parse XML
    saxParser.parse("./files/request2.xml", handler);
    return sb.toString();
}

对于我收到的两种方法：

我知道我可以轻松地return sb.toString().replaceAll("\n", "").replaceAll("\t", "");实现预期的结果，但如果我的 XML 文件格式错误，例如有额外的空格，结果也会包含额外的空格。

此外，我已经尝试过这种方法在使用 SAX 或 DOM 解析 XML 之前将其作为单行读取，但它不适用于我的 SOAP XML 示例，因为它会soap:Envelope在有断线 ( xmlns:m) 时修剪属性之间的空格：

<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"xmlns:m="http://www.example.org/stock"><soap:Body><m:GetStockName><m:StockName>ABC</m:StockName></m:GetStockName><m:GetStockPrice><m:StockPrice>10 \n  </m:StockPrice><m:StockPrice>\t20</m:StockPrice></m:GetStockPrice></soap:Body></soap:Envelope>
[Fatal Error] :1:129: Element type "soap:Envelope" must be followed by either attribute specifications, ">" or "/>".

无论 XML 文件包含在单行还是多个格式良好/错误的行（也忽略注释），我如何才能仅读取 SOAP XML 中所有节点的文本内容？

java - 仅提取所有 SOAP XML 节点文本，使用 Java

0 回答 0

Related

Reference