我制作了一个 OCR 应用程序,它将图像文件转换为 Doc 文件,使用 Tesseract 作为其 OCR 引擎。我为此使用了 Tess4j JNA Wrappers。在制作应用程序时,我将 dll 文件和语言数据(tessdata)放在项目的 bin 文件夹中,应用程序运行良好。现在,当我构建项目时,dll 文件和 tessdata 不包含在 JAR 中,因此程序无法正常工作。我尝试了两种导出方式
**1。将所需的库打包到生成的 JAR 中**
我在与 JAR 文件相同的目录中添加了 DLL 文件和 Tessdata。但它没有运行。
http://i.imgur.com/cGwiVFC.png
它给了我以下错误
F:\New folder>java -jar w.jar scan.jpg
Error opening data file bin//tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent d
irectory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoa
der.java:58)
Caused by: java.util.ServiceConfigurationError: javax.imageio.spi.ImageInputStre
amSpi: Provider com.sun.media.imageioimpl.stream.ChannelImageInputStreamSpi coul
d not be instantiated: java.lang.IllegalArgumentException: vendorName == null!
at java.util.ServiceLoader.fail(ServiceLoader.java:224)
at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIOReg
istry.java:210)
at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159
)
at javax.imageio.ImageIO.<clinit>(ImageIO.java:65)
at net.sourceforge.vietocr.ImageIOHelper.getImageByteBuffer(Unknown Sour
ce)
at net.sourceforge.tess4j.Tesseract.setImage(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at com.shaurya.back.OCR.TesseractEngine.getResult(TesseractEngine.java:2
0)
at com.shaurya.back.ImageToDocument.identify(ImageToDocument.java:117)
at com.shaurya.back.ImageToDocument.transform(ImageToDocument.java:53)
at com.shaurya.front.runnow.main(runnow.java:27)
... 5 more
Caused by: java.lang.IllegalArgumentException: vendorName == null!
at javax.imageio.spi.IIOServiceProvider.<init>(IIOServiceProvider.java:7
6)
at javax.imageio.spi.ImageInputStreamSpi.<init>(ImageInputStreamSpi.java
:90)
at com.sun.media.imageioimpl.stream.ChannelImageInputStreamSpi.<init>(Ch
annelImageInputStreamSpi.java:63)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
orAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
onstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:374)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
... 19 more
F:\New folder>
**2。在生成的 JAR 旁边的子文件夹中复制所需的库 **
在这里,我也将 dll 文件和 tessdata 文件夹复制到了与 JAR 文件相同的目录中。(如果我将其复制到包含库的子文件夹中,它甚至找不到 DLL 文件。)
http://i.imgur.com/7ShF3Ev.png
给出的错误是:
F:\New folder\kol>java -jar runn.jar scan.jpg
Error opening data file bin//tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent d
irectory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Exception in thread "main" java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:470)
at com.sun.jna.Function.invoke(Function.java:404)
at com.sun.jna.Function.invoke(Function.java:315)
at com.sun.jna.Library$Handler.invoke(Library.java:212)
at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
at net.sourceforge.tess4j.Tesseract.getOCRText(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at com.shaurya.back.OCR.TesseractEngine.getResult(TesseractEngine.java:2
0)
at com.shaurya.back.ImageToDocument.identify(ImageToDocument.java:117)
at com.shaurya.back.ImageToDocument.transform(ImageToDocument.java:53)
at com.shaurya.front.runnow.main(runnow.java:27)
F:\New folder\kol>
所以主要问题似乎是找到 Tessdata 文件夹不是很容易,尽管找到了 dll。我很好奇的另一件事是为什么在这两种情况下异常堆栈都有一些变化(这似乎不寻常,因为两者都有相同的代码并且面临相同的问题,只是包装有点不同。)
编辑1:
即使我将 dll 和 tessdata 从 bin 删除到另一个文件夹并将其添加为 Java Build Path -> Libraries 中的外部类文件夹,它也不起作用。如果我这样做,那么我会得到与未找到 tessdata 相同的错误(在应用程序本身中)。
编辑2:
instance.setDatapath("bin//tessdata");
这就是设置为我的数据路径的内容。也许以某种方式更改它可能会修复错误?
抱歉,如果帖子中有一些格式问题。StackOverflow Ask a question 没有显示任何预览或现在没有格式化按钮。如果稍后显示时出现问题,将对其进行编辑:)
-肖里亚