0

使用Apache POIHWPFDocument类将.doc文件转换为html文本时,我遇到了样式问题。另一个问题是它也像这样转换样式标签:

.b1{white-space-collapsing:preserve;} .b2{margin: 1.1798611in 1.1798611in 1.1798611in 1.1798611in;} .s1{font-weight:bold;color:black;} .s2{color:black;} .s3{font-style:italic;color:black;} .p1{text-align:center;hyphenate:none;font-family:Times New Roman;font-size:12pt;} .p2{text-align:justify;hyphenate:none;font-family:Times New Roman;font-size:12pt;} .p3{text-align:end;hyphenate:none;font-family:Times New Roman;font-size:12pt;}

Main Title

这是我的代码:

HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(multipartFile.getInputStream());

WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
wordToHtmlConverter.processDocument(wordDocument);
org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);

TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();

String html = new String(out.toByteArray());

我需要的只是将 .doc 文件的内容正确转换为 HTML 文本格式。

4

0 回答 0