我想阅读在 url 中打开的 pdf 内容:-https://dms.careerbuilder.com/viewer?Token=4aeea5b52d6e48a7beca13a992540a66&key=7b6184962856e016a5cdfcb3e27c7c30b34b5caaa6607d7d4e408f4b2ebf9dfd
try {
String pdfContent = readPdfContent(perfecturl);
Assert.assertTrue(pdfContent.contains("Test Kumar"));
Assert.assertTrue(pdfContent.contains("XXXXX"));
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
public static String readPdfContent(String url) throws IOException {
URL pdfUrl = new URL(url);
InputStream in = pdfUrl.openStream();
BufferedInputStream bf = new BufferedInputStream(in);
PDDocument doc = PDDocument.load(bf);
int numberOfPages = getPageCount(doc);
System.out.println("The total number of pages "+numberOfPages);
String content = new PDFTextStripper().getText(doc);
doc.close();
return content;
}
public static int getPageCount(PDDocument doc) {
//get the total number of pages in the pdf document
int pageCount = doc.getNumberOfPages();
return pageCount;
}
它向我抛出异常:-
Error: End-of-File, expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1093)
at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2580)
at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2551)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1128)
pdfbox 无法读取 pdf 并且此 url 是有效的 PDF,因此任何人都可以帮助我解决此问题。