0

如何使用 Apache poi 读取 ms-office .doc 文件中的图像?我已尝试使用以下代码,但它不起作用。

try {
    POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("C:\\DATASTORE\\ImageDocument.doc"));
    Document document = new Document();
    OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/ImageDocumentPDF.pdf"));
    PdfWriter.getInstance(document, fileOutput);
    document.open();

    HWPFDocument hdocument=new HWPFDocument(fs);
    Range range=hdocument.getOverallRange();
    PdfPTable createTable;
    CharacterRun run;
    PicturesTable picture=hdocument.getPicturesTable();
    int picoffset=run.getPicOffset();
    for(int i=0;i<range.numParagraphs();i++) {
        run =range.getCharacterRun(i);
        if(picture.hasPicture(run)) {
            Picture pic=picture.extractPicture(run, true);
            byte[] picturearray=pic.getContent();
            com.itextpdf.text.Image image=com.itextpdf.text.Image.getInstance(picturearray);
            document.add(image);
        }
    }
}

当我执行上面的代码并打印图片偏移值时,它显示为-1 ,当打印图片.hasPicture(run)时,尽管输入文件有图像,但它返回false 。

请帮助我找到解决方案。谢谢

4

2 回答 2

2
public static List<byte[]> extractImagesFromWord(File file) {
    if (file.exists()) {
        try {
            List<byte[]> result  = new ArrayList<byte[]>();
            if ("docx".equals(getMimeType(file).getExtension())) {
                org.apache.poi.xwpf.usermodel.XWPFDocument doc = new XWPFDocument(new FileInputStream(file));
                for (org.apache.poi.xwpf.usermodel.XWPFPictureData picture : doc.getAllPictures()) {
                    result.add(picture.getData());
                }
            } else if ("doc".equals(getMimeType(file).getExtension())) {
                org.apache.poi.hwpf.HWPFDocument doc = new HWPFDocument(new FileInputStream(file));
                for (org.apache.poi.hwpf.usermodel.Picture picture : doc.getPicturesTable().getAllPictures()) {
                    result.add(picture.getContent());
                }
            }
            return result;
        } catch (Exception e) {
            throw new RuntimeException( e);
        }
    }
    return null;
}
于 2014-03-20T09:26:53.293 回答
0

它对我有用,如果picOffset返回 -1,则表示当前 CharacterRun 没有图像

于 2014-02-07T11:59:07.503 回答