parsing - 无法使用 Tika1.3 (+lucene4.2) 解析 pdf

翻译自：https://stackoverflow.com/questions/16424934 2013-05-07T17:19:16.190

129 次

我试图解析一个 pdf 文件并获取它的元数据和文本。我仍然没有得到想要的结果。我确定这是一个愚蠢的错误，但我看不到它。文件d.pdf存在，它位于项目的根文件夹中。导入也是正确的。

public class MultiParse {
      public static void main(final String[] args) throws IOException,
                  SAXException, TikaException {
            Parser parser = new AutoDetectParser();
            File f = new File("d.pdf");        
            System.out.println("------------ Parsing a PDF:");
            extractFromFile(parser, f);
      }

      private static void extractFromFile(final Parser parser,
                  final File f ) throws IOException, SAXException,
                  TikaException {
            BodyContentHandler handler = new BodyContentHandler(10000000);
            Metadata metadata = new Metadata();
            InputStream is = TikaInputStream.get(f);
            parser.parse(is, handler, metadata, new ParseContext());
            for (String name : metadata.names()) {
                  System.out.println(name + ":\t" + metadata.get(name));
            }
      }
}

输出：没有错误，但..也不多:(

------------ Parsing a PDF:
Content-Type:   application/pdf

parsing - 无法使用 Tika1.3 (+lucene4.2) 解析 pdf

0 回答 0

Related

Reference