python - 将所有PDF文件转换为目录中的文本

Question

我刚刚下载了 PDFMiner来将 PDF 文件转换为文本。我通过在终端上执行此命令来转换文件

python pdf2txt.py -o myOutput.txt simple1.pdf

它工作正常，现在我想将该函数嵌入到我的简单 Python 脚本中。我想转换目录中的所有 PDF 文件

# Lets say I have an array with filenames on it
files = [
    'file1.pdf', 'file2.pdf', 'file3.pdf'
]

# And convert all PDF files to text
# By repeatedly executing pdf2txt.py
for x in range(0, len(files))
    # And run something like
    python pdf2txt.py -o output.txt files[x]

我也尝试过使用os.system，但出现了一个闪烁的窗口（我的终端）。我只想将数组上的所有文件都转换为文本。

score 1 · Accepted Answer

使用subprocess模块。

import subprocess

files = [
    'file1.pdf', 'file2.pdf', 'file3.pdf'
]
for f in files:
    cmd = 'python pdf2txt.py -o %s.txt %s' % (f.split('.')[0], f)
    run = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = run.communicate()

    # display errors if they occur    
    if err:
        print err

阅读子流程文档以获取更多信息。

score 0 · Accepted Answer

0

有一个 API 可以帮助您执行此类任务。阅读文档。

于 2013-05-11T12:43:45.930 回答

python - 将所有PDF文件转换为目录中的文本

2 回答 2

Related

Reference