python - Python tabula .convert_into 跳过多个空格（pdf中带有多个空格的单词在一起没有空格）

翻译自：https://stackoverflow.com/questions/55603062 2019-04-10T00:29:01.107

286 次

我正在使用 tabula.convert_into csv，它可以很好地抓取所有内容，但文本如下：

“DEV__HH WorldSummit 重新估计”

在 PDF 中变成（DEV 和 HH 之间有 2 个空格）

csv 中的“DEVHH WorldSummit Re Estimates”

我已经尝试了 encoding='utf-8' 和其他库，例如 pypdf2，但 tabula 给了我迄今为止最好的结果。这只是我需要解决的一件奇怪的事情。

import tabula

file = 'input.pdf'

tabula.convert_into(file,"output.csv",pages = 'all', output_format="csv", encoding='utf-8')

0 回答 0