python - camelot python;OSError：异常：访问冲突写入 0x00000080

Question

我试图用 Camelot 从 PDF 文件中提取表格。

这是我的代码：

import camelot
tables = camelot.read_pdf('foo.pdf')
print(tables)

运行此脚本时出现错误，如下所示：

  File "C:/Users/gibin/PycharmProjects/ML/Table_Tester.py", line 20, in <module>
    table=tables = camelot.read_pdf(r"C:\Users\gibin\PycharmProjects\ML\Doc_downloader\GWC_Docs\781313686.pdf")
  File "C:\Users\gibin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\camelot\io.py", line 117, in read_pdf
    **kwargs
  File "C:\Users\gibin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\camelot\handlers.py", line 172, in parse
    p, suppress_stdout=suppress_stdout, layout_kwargs=layout_kwargs
  File "C:\Users\gibin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\camelot\parsers\lattice.py", line 403, in extract_tables
    self._generate_image()
  File "C:\Users\gibin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\camelot\parsers\lattice.py", line 220, in _generate_image
    with Ghostscript(*gs_call, stdout=null) as gs:
  File "C:\Users\gibin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\camelot\ext\ghostscript\__init__.py", line 95, in Ghostscript
    stderr=kwargs.get("stderr", None),
  File "C:\Users\gibin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\camelot\ext\ghostscript\__init__.py", line 39, in __init__
    rc = gs.init_with_args(instance, args)
  File "C:\Users\gibin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\camelot\ext\ghostscript\_gsprint.py", line 169, in init_with_args
    rc = libgs.gsapi_init_with_args(instance, len(argv), c_argv)

OSError: exception: access violation writing 0x00000080

Process finished with exit code 1

我该如何解决这个问题，或者还有其他方法可以从 PDF 中获取表格吗？

编辑：相同的脚本在 jupyter notebook 中运行良好，但在 pycharm 中无法运行。

score 1 · Accepted Answer

您是否通过 PyPI 存储库安装了 Camelot，即pip install camelot-py[cv]？

从源代码重新安装 Camelot 后，我不再收到此错误：

git clone https://www.github.com/camelot-dev/camelot
cd camelot
pip install ".[cv]"

参考：

score 0 · Accepted Answer

就我在 Windows 7 的情况而言，我将风格更改为“流”，一切都开始好了，因为我使用的 pdf 文件没有任何可见的表格，而“流”风格适合这种情况pdf 文件，而在默认的 camelot 中将风味设置为“格子”。

代码将是这样的：

import camelot
tables = camelot.read_pdf('foo.pdf', flavor = 'stream')
print(tables)

我不知道为什么会这样，因为如果我在 Debian 10 中运行相同的代码（在 Windows 7 中显示错误），一切都很好（但最终不会检测到表）。

编辑：我在 jupyter notebook 上运行这些代码。我不知道如果它在 PyCharm 上运行会怎样。

python - camelot python;OSError：异常：访问冲突写入 0x00000080

2 回答 2

参考：

Related

Reference