I've written some code to perform OCR on a PDF using Tesseract (Tess4J):
public void DoOCRAnalyse(String From) throws FileNotFoundException {
Tesseract instance = Tesseract.getInstance(); // JNA Interface Mapping
File[] files=PdfUtilities.convertPdf2Png(new File(From));
for (File f:files) {
try {
String result = instance.doOCR(f);
/*String result = instance.doOCR(take File or BufferedImage); */
SearchForSVHC(result,SvhcList);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
}
}
It recognizes text, which is great, but my problem is that it needs the images to be in a directory on disk. How can I pass a BufferedImage
or File
to the methode doOCR()
without needing the files on disk?