2

首先,您应该知道我已经研究了很多问题,但没有一个对我有帮助。我希望能够阅读 doc 和 docx 文档(当我说阅读时,我的意思是最简单的事情,只阅读文本)。我看到了一些关于 poi 和暂存器的帖子,但我无法让它正常工作,而且大多数时候 eclipse 甚至无法构建我的项目......

有人可以给我一个 doc 和 docx 的代码示例,并给我我需要使用的所有 jar 的名称(或链接)吗?

谢谢!

基本上这是代码:

try {
    if (getFileExtention(path).equals("docx")) {
        FileInputStream fis = new FileInputStream(path);
        XWPFWordExtractor oleTextExtractor =
            new XWPFWordExtractor(new XWPFDocument(fis));
        return oleTextExtractor.getText();
    } else if (getFileExtention(path).equals("doc")) {
        FileInputStream fis = new FileInputStream(path);
        WordExtractor we = new WordExtractor(fis);
        return we.getText();
    }
} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}


return "";

我有以下罐子:

dom4j-1.6.1.jar

poi-3.8-20120326.jar

poi-ooxml-3.8-20120326.jar

poi-scratchpad-3.8-20120326.jar

xmlbeans-xmlpublic-2.4.0.jar

我有以下问题:

这个在构建过程中发生了很多次

> [2012-07-05 14:12:53 - iCards] Dx warning: Ignoring InnerClasses
> attribute for an anonymous inner class
> (org.dom4j.xpath.DefaultXPath$1) that doesn't come with an associated
> EnclosingMethod attribute. This class was probably produced by a
> compiler that did not target the modern .class file format. The
> recommended solution is to recompile the class from source, using an
> up-to-date compiler and without specifying any "-target" type options.
> The consequence of ignoring this warning is that reflective operations
> on this class will incorrectly indicate that it is *not* an inner
> class.

另一个:(尝试阅读docx时)

> 07-05 14:17:13.245: W/System.err(4339): java.io.IOException: read
> failed: EBADF (Bad file number) 07-05 14:17:13.255:
> W/System.err(4339):   at libcore.io.IoBridge.read(IoBridge.java:432)
> 07-05 14:17:13.260: W/System.err(4339):   at
> java.io.FileInputStream.read(FileInputStream.java:179) 07-05
> 14:17:13.265: W/System.err(4339):     at
> java.io.PushbackInputStream.read(PushbackInputStream.java:196) 07-05
> 14:17:13.270: W/System.err(4339):     at
> libcore.io.Streams.readFully(Streams.java:81) 07-05 14:17:13.275:
> W/System.err(4339):   at
> java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:230)
> 07-05 14:17:13.280: W/System.err(4339):   at
> org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:51)
> 07-05 14:17:13.285: W/System.err(4339):   at
> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:83)
> 07-05 14:17:13.290: W/System.err(4339):   at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:228)
> 07-05 14:17:13.295: W/System.err(4339):   at
> org.apache.poi.util.PackageHelper.open(PackageHelper.java:39) 07-05
> 14:17:13.300: W/System.err(4339):     at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:120)
> 07-05 14:17:13.305: W/System.err(4339):   at
> com.ronEven.iCards.AddRemove.loadFile(AddRemove.java:504) 07-05
> 14:17:13.310: W/System.err(4339):     at
> com.ronEven.iCards.AddRemove.showDoc(AddRemove.java:495) 07-05
> 14:17:13.315: W/System.err(4339):     at
> com.ronEven.iCards.AddRemove.setFilePath(AddRemove.java:492) 07-05
> 14:17:13.320: W/System.err(4339):     at
> com.ronEven.iCards.FileDialog$1.onClick(FileDialog.java:177) 07-05
> 14:17:13.325: W/System.err(4339):     at
> android.view.View.performClick(View.java:3591) 07-05 14:17:13.330:
> W/System.err(4339):   at
> android.view.View$PerformClick.run(View.java:14263) 07-05
> 14:17:13.335: W/System.err(4339):     at
> android.os.Handler.handleCallback(Handler.java:605) 07-05
> 14:17:13.340: W/System.err(4339):     at
> android.os.Handler.dispatchMessage(Handler.java:92) 07-05
> 14:17:13.345: W/System.err(4339):     at
> android.os.Looper.loop(Looper.java:137) 07-05 14:17:13.345:
> W/System.err(4339):   at
> android.app.ActivityThread.main(ActivityThread.java:4507) 07-05
> 14:17:13.345: W/System.err(4339):     at
> java.lang.reflect.Method.invokeNative(Native Method) 07-05
> 14:17:13.350: W/System.err(4339):     at
> java.lang.reflect.Method.invoke(Method.java:511) 07-05 14:17:13.350:
> W/System.err(4339):   at
> com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:790)
> 07-05 14:17:13.350: W/System.err(4339):   at
> com.android.internal.os.ZygoteInit.main(ZygoteInit.java:557) 07-05
> 14:17:13.350: W/System.err(4339):     at
> dalvik.system.NativeStart.main(Native Method) 07-05 14:17:13.355:
> W/System.err(4339): Caused by: libcore.io.ErrnoException: read failed:
> EBADF (Bad file number) 07-05 14:17:13.360: W/System.err(4339):   at
> libcore.io.Posix.readBytes(Native Method) 07-05 14:17:13.360:
> W/System.err(4339):   at libcore.io.Posix.read(Posix.java:118) 07-05
> 14:17:13.360: W/System.err(4339):     at
> libcore.io.BlockGuardOs.read(BlockGuardOs.java:149) 07-05
> 14:17:13.360: W/System.err(4339):     at
> libcore.io.IoBridge.read(IoBridge.java:422) 07-05 14:17:13.365:
> W/System.err(4339):   ... 24 more

尝试阅读文档时的最后一个

    07-05 14:17:37.015: W/System.err(4339): java.io.IOException: read failed: EBADF (Bad file number)
07-05 14:17:37.020: W/System.err(4339):     at libcore.io.IoBridge.read(IoBridge.java:432)
07-05 14:17:37.025: W/System.err(4339):     at java.io.FileInputStream.read(FileInputStream.java:179)
07-05 14:17:37.055: W/System.err(4339):     at java.io.PushbackInputStream.read(PushbackInputStream.java:196)
07-05 14:17:37.055: W/System.err(4339):     at java.io.InputStream.read(InputStream.java:163)
07-05 14:17:37.060: W/System.err(4339):     at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:95)
07-05 14:17:37.065: W/System.err(4339):     at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:53)
07-05 14:17:37.070: W/System.err(4339):     at com.ronEven.iCards.AddRemove.loadFile(AddRemove.java:509)
07-05 14:17:37.075: W/System.err(4339):     at com.ronEven.iCards.AddRemove.showDoc(AddRemove.java:495)
07-05 14:17:37.085: W/System.err(4339):     at com.ronEven.iCards.AddRemove.setFilePath(AddRemove.java:492)
07-05 14:17:37.090: W/System.err(4339):     at com.ronEven.iCards.FileDialog$1.onClick(FileDialog.java:177)
07-05 14:17:37.095: W/System.err(4339):     at android.view.View.performClick(View.java:3591)
07-05 14:17:37.100: W/System.err(4339):     at android.view.View$PerformClick.run(View.java:14263)
07-05 14:17:37.105: W/System.err(4339):     at android.os.Handler.handleCallback(Handler.java:605)
07-05 14:17:37.110: W/System.err(4339):     at android.os.Handler.dispatchMessage(Handler.java:92)
07-05 14:17:37.115: W/System.err(4339):     at android.os.Looper.loop(Looper.java:137)
07-05 14:17:37.120: W/System.err(4339):     at android.app.ActivityThread.main(ActivityThread.java:4507)
07-05 14:17:37.120: W/System.err(4339):     at java.lang.reflect.Method.invokeNative(Native Method)
07-05 14:17:37.125: W/System.err(4339):     at java.lang.reflect.Method.invoke(Method.java:511)
07-05 14:17:37.125: W/System.err(4339):     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:790)
07-05 14:17:37.130: W/System.err(4339):     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:557)
07-05 14:17:37.130: W/System.err(4339):     at dalvik.system.NativeStart.main(Native Method)
07-05 14:17:37.130: W/System.err(4339): Caused by: libcore.io.ErrnoException: read failed: EBADF (Bad file number)
07-05 14:17:37.150: W/System.err(4339):     at libcore.io.Posix.readBytes(Native Method)
07-05 14:17:37.160: W/System.err(4339):     at libcore.io.Posix.read(Posix.java:118)
07-05 14:17:37.160: W/System.err(4339):     at libcore.io.BlockGuardOs.read(BlockGuardOs.java:149)
07-05 14:17:37.160: W/System.err(4339):     at libcore.io.IoBridge.read(IoBridge.java:422)
07-05 14:17:37.165: W/System.err(4339):     ... 20 more
4

3 回答 3

3

Tika支持 Microsoft Office 格式以及许多其他格式,它为您提供了所有格式的通用界面,并隐藏了维护和学习如何使用许多不同库的复杂性。就像调用这个函数一样简单。您也可以直接使用Office ParserOOXMLParser

于 2012-07-05T11:14:03.147 回答
0

您还有非常强大的应用程序,如LibreOffice SDK(或 OpenOffice 3),您可以在其中阅读和管理文档(如 docx)并以.txt格式保存它们。

于 2012-07-05T11:53:40.700 回答
0
  • 为了阅读DOCX文档,我们可以使用XWPFWordExtractorXWPFDocument
  • 为了阅读DOC文档,我们可以使用WordExtractorHWPFDocument

你得到了 DOCX 文档的正确代码:

XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));

但是您的 DOC 文档代码中缺少 HWPFDocument。只需更改此行:

WordExtractor we = new WordExtractor(fis);

进入这个:

WordExtractor we = new WordExtractor(new HWPFDocument(fis));

至于 jar 文件,您的构建路径中似乎只缺少 poi-ooxml-schemas-3.8-20120326.jar。

于 2016-09-10T18:07:28.963 回答