0

我需要将 .docx 文件内容转换为 HTML 文本以便在 web ui 中显示。

我使用了Apache POIXWPFDocument类,但还没有得到任何结果;得到空字符串。我的代码基于此示例

这也是我的代码:

public JSONObject uploadDocxFile(MultipartFile multipartFile) throws Exception {
        InputStream inputStream = multipartFile.getInputStream();
        XWPFDocument wordDocument = new XWPFDocument(inputStream);

        WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
        org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DOMSource domSource = new DOMSource(htmlDocument);
        StringWriter stringWriter = new StringWriter();

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer serializer = tf.newTransformer();
        serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, new StreamResult(stringWriter));
        out.close();

        String result = new String(out.toByteArray());
        String htmlText = result;

        JSONObject jsonObject = new JSONObject();
        jsonObject.put("content", htmlText);
        jsonObject.put("success", true);
        return jsonObject;
    }
4

3 回答 3

1

即使为时已晚我认为可以通过这种方式修改以前的代码(它适用于word97文档)

    private static void convertWordDoc2HTML(File file)
    throws ParserConfigurationException, TransformerConfigurationException,TransformerException, IOException {       
    //change the type from XWPFDocument to HWPFDocument
    HWPFDocument hwpfDocument = null;
    try {
        FileInputStream fis = new FileInputStream(file);
        POIFSFileSystem fileSystem = new POIFSFileSystem(fis);          
             hwpfDocument = new HWPFDocument(fileSystem);

    } catch (IOException ex) {
        ex.printStackTrace();
    }

    WordToHtmlConverter wordToHtmlConverter = new   WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
    org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
    //add processDocument method 
    wordToHtmlConverter.processDocument(hwpfDocument);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DOMSource domSource = new DOMSource(htmlDocument);
    StreamResult streamResult = new StreamResult(out);

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
    serializer.setOutputProperty(OutputKeys.METHOD, "html");
    serializer.transform(domSource, streamResult);
    out.close();

    String result = new String(out.toByteArray());

    String htmlText = result;
    System.out.println(htmlText);

    }

我希望它有用。

于 2015-04-28T07:57:22.710 回答
0

我正在使用docx4j来执行此操作,它似乎正在工作。如果您使用的是 Maven,您只需添加依赖项(但使用版本 3.0.0),然后使用名为ConvertOutHtml.java. 只需将文件路径更改ConvertOutHtml.java为指向您的文件,就可以了。

于 2014-01-29T23:12:23.583 回答
0

您的代码正在生成一个空的 html 输出,因为您没有在转换器中处理任何文档。

无论如何,如果它是一个 docx,您应该使用 XHTMLConverter 将其转换为 HTML 而不是 WordToHtmlConverter。看到这个答案

于 2016-01-27T11:35:30.997 回答