0

运行Apache Ticka 1.24.1,如下:

java -Xmx3G -Djava.io.tmpdir=/mytmp/tmp -spawnChild -taskPulseMillis 240000 -jar tika-server.jar --host=hostname.domain.com

可以更改数组长度以不出现此错误吗?

org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.microsoft.OfficeParser@4fab9c0a
...
...
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate 
an array of length 1835606, but 1000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request 
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with 
IOUtils.setByteArrayMaxOverride()
at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630)
at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208)
at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610)
at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596)
at org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init 
(MAPIRtfAttribute.java:49)
...
org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(
OutlookExtractor.java:328)
4

1 回答 1

1

请在您的 tika-config.xml 中添加以下配置,并让我们知道它是否有效。

<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser"/>
        <parser class="org.apache.tika.parser.microsoft.OfficeParser">
            <params>
                <param name="byteArrayMaxOverride" type="int">2000000</param>

            </params>
        </parser>
    </parsers>
</properties>
于 2020-10-06T07:21:48.137 回答