4

I'm using Apache Tika for extracting metadata from documents. I'm mostly interested in setting up a basic dublin core, like Author, Title, Date, etc. I'm not interested in the content of the documents at all. Currently I'm simply doing the usual thing:

 FileInputStream fis = new FileInputStream( uploadedFileLocation );
 // Tika parsing
 Metadata metadata = new Metadata();
 ContentHandler handler = new BodyContentHandler();
 AutoDetectParser parser = new AutoDetectParser();
 parser.parse(fis, handler, metadata);

Is there some way to tell Tika to not parse the content? I'm hoping that this will speed things up as well as save memory.

4

0 回答 0