我正在使用 PDFBox 和 tika 对 pdf 文件进行内容索引。PDFFBox 1.8 一切正常,但是当 PDFBox 更新到 2.0.2 时,它给了我以下错误:
(Thread-62 (HornetQ-client-global-threads-2071379348)) Exception while creating solr doucment for content::Failed to close temporary resources: org.apache.tika.exception.TikaException: Failed to close temporary resources
at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:152)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:149)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.hornetq.jms.client.JMSMessageListenerWrapper.onMessage(JMSMessageListenerWrapper.java:91)
at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:983)
at org.hornetq.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:48)
at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1113)
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Could not delete temporary file C:\Users\FILESE~1\AppData\Local\Temp\apache-tika-7918716906396425097.tmp
at org.apache.tika.io.TemporaryResources$1.close(TemporaryResources.java:70)
at org.apache.tika.io.TemporaryResources.close(TemporaryResources.java:121)
at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:150)
... 18 more
你能帮我解决这个问题吗?
因此,我将 PDFBox 更新为 2.0.2 。
我的gradle依赖是:
compile "org.apache.poi:poi:3.8"
compile "org.apache.poi:poi-ooxml:3.8"
compile "org.apache.poi:poi-scratchpad:3.8"
compile "org.apache.pdfbox:pdfbox:2.0.2"
compile 'org.apache.tika:tika-parsers:1.5'
compile 'org.apache.tika:tika-core:1.5'
这里我使用的是 tika 1.5,这个版本支持 pdfbox 2.0.3。你可以在这里看到