java - Apache Tika: Parsing only metadata without content extraction

问问题 2012-02-08T10:43:34.260

2620 次

I'm using Apache Tika for extracting metadata from documents. I'm mostly interested in setting up a basic dublin core, like Author, Title, Date, etc. I'm not interested in the content of the documents at all. Currently I'm simply doing the usual thing:

 FileInputStream fis = new FileInputStream( uploadedFileLocation );
 // Tika parsing
 Metadata metadata = new Metadata();
 ContentHandler handler = new BodyContentHandler();
 AutoDetectParser parser = new AutoDetectParser();
 parser.parse(fis, handler, metadata);

Is there some way to tell Tika to not parse the content? I'm hoping that this will speed things up as well as save memory.

java - Apache Tika: Parsing only metadata without content extraction

0 回答 0

Related

Reference