I'm trying to parse a RTF file using Apache Tika. Inside the file there is a table with several columns.
The problem is that the parser writes out the result without any information in which column the value was.
What I'm doing right now is:
AutoDetectParser adp = new AutoDetectParser(tc);
Metadata metadata = new Metadata();
String mimeType = new Tika().detect(file);
metadata.set(Metadata.CONTENT_TYPE, mimeType);
BodyContentHandler handler = new BodyContentHandler();
InputStream fis = new FileInputStream(file);
adp.parse(fis, handler, metadata, new ParseContext());
fis.close();
System.out.println(handler.toString());
It works but I need to know like meta-information.
Is there already a Handler which outputs something like HTML with a structure of the read RTF file?