pdf - Tess4J doOCR() for First Page of pdf / tif

Question

有没有办法告诉 Tess4J 只 OCR 一定数量的页面/字符？

我可能会使用200 多页 PDF，但我真的只想对第一页进行 OCR，如果那样的话！

据我了解，常见的样本

package net.sourceforge.tess4j.example;

import java.io.File;
import net.sourceforge.tess4j.*;

    public class TesseractExample {

        public static void main(String[] args) {
            File imageFile = new File("eurotext.tif");
            Tesseract instance = Tesseract.getInstance();  // JNA Interface Mapping
            // Tesseract1 instance = new Tesseract1(); // JNA Direct Mapping

            try {
                String result = instance.doOCR(imageFile);
                System.out.println(result);
            } catch (TesseractException e) {
                System.err.println(e.getMessage());
            }
        }
    }

将尝试将整个 200 多个页面OCR转换为单个字符串。

对于我的特殊情况，这比我需要做的要多得多，我担心如果我让它完成所有 200 多页然后只完成前 500 页左右可能需要很长时间。substring

score 1 · Accepted Answer

1

该库有一个PdfUtilities类可以提取 PDF 的某些页面。

于 2014-10-23T00:16:11.937 回答

pdf - Tess4J doOCR() for *First Page* of pdf / tif

1 回答 1

Related

Reference

pdf - Tess4J doOCR() for First Page of pdf / tif