0

我正在使用 Tesseract 3.0.2 并使用 1.4.1 tess4j ..这不是以线程安全的方式工作,我得到了一个 NPE。我正在使用 Grizzly/Jesery/Spring。

@Service("textExtractorService")
public class TextExtractorServiceImpl implements TextExtractorService {
    Logger LOGGER = Logger.getLogger(TextExtractorServiceImpl.class);

    private final Tesseract instance = Tesseract.getInstance(); // JNA Interface

... ..

    }

……

 public ExtractedInfo extract(BufferedImage bufferedImage)
        throws IOException {
    ExtractedInfo extractedInfo = new ExtractedInfo();
    try {
        BufferedImage preProcessed = preProcess(bufferedImage);
        String result = null;

       //the below gives me the NPE, when multiple threads calls this method.
        result = instance.doOCR(preProcessed);

        String[] r = StringUtils.split(result, "\n");
        extractedInfo.setRawText(r);
    } catch (TesseractException e) {
        throw new IOException(e);
    }
    return extractedInfo;
    }

……

Full stack Trace:
SEVERE: service exception:
javax.servlet.ServletException: java.lang.Error: Invalid memory access
    at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:420)
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at com.sun.grizzly.http.servlet.ServletAdapter$FilterChainImpl.doFilter(ServletAdapter.java:1059)
    at com.sun.grizzly.http.servlet.ServletAdapter$FilterChainImpl.invokeFilterChain(ServletAdapter.java:999)
    at com.sun.grizzly.http.servlet.ServletAdapter.doService(ServletAdapter.java:434)
    at com.sun.grizzly.http.servlet.ServletAdapter.service(ServletAdapter.java:379)
    at com.sun.grizzly.tcp.http11.GrizzlyAdapter.service(GrizzlyAdapter.java:179)
    at com.sun.grizzly.tcp.http11.GrizzlyAdapterChain.service(GrizzlyAdapterChain.java:196)
    at com.sun.grizzly.tcp.http11.GrizzlyAdapter.service(GrizzlyAdapter.java:179)
    at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:850)
    at com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:747)
    at com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1032)
    at com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:231)
    at com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
    at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104)
    at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90)
    at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79)
    at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
    at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59)
    at com.sun.grizzly.ContextTask.run(ContextTask.java:71)
    at com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
    at com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Error: Invalid memory access
    at com.sun.jna.Native.invokeVoid(Native Method)
    at com.sun.jna.Function.invoke(Function.java:367)
    at com.sun.jna.Function.invoke(Function.java:315)
    at com.sun.jna.Library$Handler.invoke(Library.java:212)
    at com.sun.proxy.$Proxy55.TessBaseAPIDelete(Unknown Source)
    at net.sourceforge.tess4j.Tesseract.dispose(Tesseract.java:346)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:242)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:200)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:184)
    at com.vanitysoft.thirdeye.service.impl.TextExtractorServiceImpl.extract(TextExtractorServiceImpl.java:69)
    at com.vanitysoft.thirdeye.web.TextExtractorResource.extract(TextExtractorResource.java:49)
4

1 回答 1

1

我不确定这是否是完全相同的问题,但我在类似的问题上找到了这个答案。

https://stackoverflow.com/a/24806132/2596497

简而言之,Tesseract 中的底层引擎似乎不支持多线程。

于 2015-03-10T14:08:58.570 回答