0

我正在读取系统中存在的所有 pdf 文件,并将其从命令行实用程序“pdftotext”写入文本文件“output.txt”,但是在读取结构不正确的文件(如图像的 pdf 文件和许多其他文件)时,它会引发一些错误,例如

/home/vikrantsingh/Downloads/ARRAYS_NEW.pdf
/home/vikrantsingh/Downloads/GPOS_casestudy_solution_v2.pdf
/home/vikrantsingh/Downloads/Tutorial.pdf
/home/vikrantsingh/Downloads/The_C_Programming_Language.pdf
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (27972): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (41087): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (51900): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (62716): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (65450): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (68463): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'

我想要的是当它遇到第一个错误时,只需移动到下一个文件而不是读取同一个文件。我使用的是 Python 2.7。我的代码就像

    import os
    import sys
    import re
    import subprocess
    root = '/home'
    targetpath = ""
    path = os.path.join(root, targetpath)
    filepath = []
    count = 0
    filesize = 0
    for r,subdir,f in os.walk(path):
        ultimate_path = os.path.join(path,r)
        for file in f:
             if file.find(".pdf")!=-1:
             print os.path.join(ultimate_path,file)
             filesize = os.path.getsize(os.path.join(ultimate_path,file))+filesize
             subprocess.call(['pdftotext', os.path.join(ultimate_path,file), 'output.txt'])
        #print file

        count = count+1
        print count
        print filesize/(1048576.0)

这是从“pdftotext”读取 pdf 文件的示例代码。我想捕捉错误,以便继续阅读下一个 pdf。

我看过一篇关于这个的帖子。谢谢

4

1 回答 1

1

这些错误消息是由 生成的pdftotext。它们不是 Python 异常,因此不能用try..except.

您可以运行pdftotext -q使错误消息静音

 subprocess.call(['pdftotext', '-q', os.path.join(ultimate_path,file), 'output.txt'])
于 2013-02-23T11:32:35.777 回答