这段代码有什么问题...我正在尝试解析 pdf 文件并从中提取文本...但是对于某些 pdf 我能够提取文本...对于某些它会引发错误
Invalid dictionary, found: '' but expected: '/'
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@67fb878
而且对于某些pdf,我也没有在md变量中获得任何元数据值……但是对于某些人,我明白了……
这是我的代码..!! ByteArray 有问题??
private BinaryParser binaryParser;
binaryParser.parse(page.getBinaryData());
public void parse(byte[] data) {
InputStream is = null;
try {
is = new ByteArrayInputStream(data);
text = null;
Metadata md = new Metadata();
metaData = new HashMap<String, String>();
text = tika.parseToString(is, md).trim();
processMetaData(md);
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(is);
}
}
private void processMetaData(Metadata md){
if ((getMetaData() == null) || (!getMetaData().isEmpty())) {
setMetaData(new HashMap<String, String>());
}
for (String name : md.names()){
getMetaData().put(name.toLowerCase(), md.get(name));
}
}