apache - Xpath 搜索 .docx

Question

我想从 .docx 文件中的子表中读取特定文本。是否有像 xpath 遍历或 java 中支持的类似 api 这样的有效方法。

目前我尝试使用java apache poi（下面的代码片段）读取.docx，但这样我必须根据标签'w：tr'迭代所有节点并读取节点文本值。有没有什么方法可以根据 xpath 之类的搜索模式快速检索所需数据。？？. 任何输入都受到高度赞赏。

              File myFile = new File( "D:\\XLS-Pages\\TestSherwin.docx" );
              ZipFile docxFile = new ZipFile( myFile );
        ZipEntry documentXML = docxFile.getEntry( "word/document.xml" );
        InputStream documentXMLIS = docxFile.getInputStream( documentXML );
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        org.w3c.dom.Document doc = dbf.newDocumentBuilder().parse( documentXMLIS );

        org.w3c.dom.Element tElement = doc.getDocumentElement();
        NodeList n = (NodeList) tElement.getElementsByTagName( "w:tr" );

score 1 · Accepted Answer

1

您可以在 docx4j 中使用 XPath；支持基于 JAXB 对 XPath 的支持，具有各种限制。

于 2013-06-27T20:47:05.267 回答

apache - Xpath 搜索 .docx

1 回答 1

Related

Reference