1

我正在使用 POI 从 excel 文件中提取数据。(Excel 表中的第 5 列包含我的文件系统中存在的文件的名称)我循环遍历表的行(使用 POI 提取单元格的内容)并且对于每一行我创建 Tika 实例,并解析在第 5 列带有 Tika “parseToString(file)”,当文件是 Office 文档(excel、ppt、word)时,我收到此错误:

Exception in thread "AWT-EventQueue-0" java.lang.NoSuchFieldError: filesystem
    at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:185)
    at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:131)
    at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:61)
    at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:182)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
    at org.apache.tika.Tika.parseToString(Tika.java:357)
    at org.apache.tika.Tika.parseToString(Tika.java:423)
    at org.apache.tika.Tika.parseToString(Tika.java:403)
    at HP.BuildMailExcelDoc.getTextFromTika(BuildMailExcelDoc.java:355)
    at HP.BuildMailExcelDoc.addExcelDoc(BuildMailExcelDoc.java:314)
    at HP.BuildMailExcelDoc.buildDoc(BuildMailExcelDoc.java:196)
    at HP.BuildMailExcelDoc.buildMailDoc(BuildMailExcelDoc.java:102)
    at HP.BuildMailExcelDoc.indexDirectory(BuildMailExcelDoc.java:69)
    at HP.BuildMailExcelDoc.indexDirectory(BuildMailExcelDoc.java:78)
    at HP.BuildMailExcelDoc.buildDoc(BuildMailExcelDoc.java:63)
    at HP.IndexGUI$1.mouseClicked(IndexGUI.java:281)
    at java.awt.AWTEventMulticaster.mouseClicked(Unknown Source)
    at java.awt.Component.processMouseEvent(Unknown Source)
    at javax.swing.JComponent.processMouseEvent(Unknown Source)
    at java.awt.Component.processEvent(Unknown Source)
    at java.awt.Container.processEvent(Unknown Source)
    at java.awt.Component.dispatchEventImpl(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Window.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.EventQueue.dispatchEvent(Unknown Source)
    at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
    at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at java.awt.EventDispatchThread.run(Unknown Source)

我认为这个问题是 POI 中嵌套使用的结果。一次在 excel 表中,然后再次在 Tika 解析调用中。

听起来合理吗?我该如何处理这个问题?

谢谢 :-) 罗伊斯

4

1 回答 1

5

看起来您的类路径上有两个 POI 副本。我猜你有 Tika 提供的新版本,还有一个旧版本。问题是 Java 正在获取您的类路径中的第一个版本,这是旧版本。

您的解决方案是从类路径中删除旧版本。有关如何识别旧副本的来源,请参阅此 POI 常见问题解答条目

于 2011-09-07T14:13:02.727 回答