我正在读取系统中存在的所有 pdf 文件,并将其从命令行实用程序“pdftotext”写入文本文件“output.txt”,但是在读取结构不正确的文件(如图像的 pdf 文件和许多其他文件)时,它会引发一些错误,例如
/home/vikrantsingh/Downloads/ARRAYS_NEW.pdf
/home/vikrantsingh/Downloads/GPOS_casestudy_solution_v2.pdf
/home/vikrantsingh/Downloads/Tutorial.pdf
/home/vikrantsingh/Downloads/The_C_Programming_Language.pdf
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (27972): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (41087): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (51900): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (62716): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (65450): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
Error (68463): No font in show
Error: Missing language pack for 'Adobe-Japan1' mapping
Error: Unknown font tag 'C0_0'
我想要的是当它遇到第一个错误时,只需移动到下一个文件而不是读取同一个文件。我使用的是 Python 2.7。我的代码就像
import os
import sys
import re
import subprocess
root = '/home'
targetpath = ""
path = os.path.join(root, targetpath)
filepath = []
count = 0
filesize = 0
for r,subdir,f in os.walk(path):
ultimate_path = os.path.join(path,r)
for file in f:
if file.find(".pdf")!=-1:
print os.path.join(ultimate_path,file)
filesize = os.path.getsize(os.path.join(ultimate_path,file))+filesize
subprocess.call(['pdftotext', os.path.join(ultimate_path,file), 'output.txt'])
#print file
count = count+1
print count
print filesize/(1048576.0)
这是从“pdftotext”读取 pdf 文件的示例代码。我想捕捉错误,以便继续阅读下一个 pdf。
我看过一篇关于这个的帖子。谢谢