python-3.x - 是什么导致 AttributeError: 'list' object has no attribute 'read' 在使用 Tabula 读取 pdf 时？

Question

我正在尝试使用 Tabula 从 pdf 中提取表格信息并将其转换为 pandas 数据框。我一直在按照本教程中的步骤进行操作：

当我尝试使用以下代码（直接取自教程）将远程 PDF 加载到我的 jupyter 笔记本中时：

import tabula
df2 = tabula.read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/arabic.pdf")

我得到错误：

AttributeError：“列表”对象没有属性“读取”

我试图阅读本地保存到我机器上的 pdf，但我得到了同样的错误。我相信我已经成功安装了 Java 并正确配置了环境变量，并且我拥有最新版本的 Tabula。

链接到我的 jupyter 笔记本的屏幕截图：

谢谢。

score 1 · Accepted Answer

确保您安装了正确的tabula软件包！

如果你跑了pip3 install tabula，那么你安装了一个冒名顶替者！

运行pip3 uninstall tabula将其删除，然后运行：

pip3 install tabula-py

安装正确的软件包。

1 回答 1