我正在使用 iText v5.5.1 读取 PDF 并从中渲染文本:
pdfReader = new PdfReader(new CloseShieldInputStream(is));
pdfParser = new PdfReaderContentParser(pdfReader);
int maxPageNumber = pdfReader.getNumberOfPages();
int pageNumber = 1;
StringBuilder sb = new StringBuilder();
SimpleTextExtractionStrategy extractionStrategy = new SimpleTextExtractionStrategy();
while (pageNumber <= maxPageNumber) {
pdfParser.processContent(pageNumber, extractionStrategy);
sb.append(extractionStrategy.getText());
pageNumber++;
}
在一个 PDF 文件上引发以下异常:
java.lang.ClassCastException: com.itextpdf.text.pdf.PdfNumber cannot be cast to com.itextpdf.text.pdf.PdfLiteral
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:382)
at com.itextpdf.text.pdf.parser.PdfReaderContentParser.processContent(PdfReaderContentParser.java:80)
该 PDF 文件似乎已损坏,但也许它的内容仍然有意义......