0

我正在使用 tabula-0.9.2 和 Python 3.6.1 和 java 版本“1.8.0_45”从一些 PDF 中提取表格,如下所示:

from tabula import read_pdf_table
read_pdf_table(pdf_file, pages=1, silent=True)

在大多数情况下,这是可行的,但我遇到了其中几个例外。任何人都知道如何找出造成这种情况的根本原因?是否有我遗漏的 read_pdf_table 参数可能是这个问题?我想我所有的依赖版本都是正确的,除非我遗漏了什么?请指教。谢谢。

Jul 13, 2017 3:52:31 PM org.apache.pdfbox.pdfviewer.PageDrawer processTextPosition
SEVERE: java.io.IOException: Problem reading font data.
java.io.IOException: Problem reading font data.
        at java.awt.Font.createFont0(Font.java:1000)
        at java.awt.Font.createFont(Font.java:877)
        at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getawtFont(PDTrueTypeFont.java:471)
        at org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:110)
        at org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:260)
        at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:504)
        at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:56)
        at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:562)
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:269)
        at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:236)
        at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:216)
        at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:139)
        at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
        at technology.tabula.detectors.NurminenDetectionAlgorithm.detect(NurminenDetectionAlgorithm.java:93)
        at technology.tabula.CommandLineApp$TableExtractor.extractTablesBasic(CommandLineApp.java:372)
        at technology.tabula.CommandLineApp$TableExtractor.extractTables(CommandLineApp.java:359)
        at technology.tabula.CommandLineApp.extractFile(CommandLineApp.java:166)
        at technology.tabula.CommandLineApp.extractFileTables(CommandLineApp.java:123)
        at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:104)
        at technology.tabula.CommandLineApp.main(CommandLineApp.java:74)
4

0 回答 0