1

我有一些与在 Python 中设置最大运行时间有关的问题。事实上,我想使用 pdfminer 将 PDF 文件转换为 .txt。问题是,很多时候,一些文件无法解码并且需要很长时间。所以我想将time.time()每个文件的转换时间设置为 20 秒。另外,我在 Windows 下运行,所以我不能使用信号功能。

我成功地运行了转换代码pdfminer.convert_pdf_to_txt()(在我的代码中它是“c”),但我无法将它集成time.time()到 while 循环中。在我看来,在下面的代码中,while 循环并time.time()不起作用。

总之,我想:

  1. 将 PDf 文件转换为 .txt 文件

  2. 每次转换的时间限制为 20 秒。如果超时,抛出异常并保存一个空文件

  3. 将所有txt文件保存在同一个文件夹下

  4. 如果有任何异常/错误,仍然保存文件,但内容为空。

这是当前代码:

import converter as c
import os
import timeit
import time

yourpath = 'D:/hh/'

for root, dirs, files in os.walk(yourpath, topdown=False):

    for name in files:

        t_end = time.time() + 20

        try:
            while time.time() < t_end:

                c.convert_pdf_to_txt(os.path.join(root, name))

                t = os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                a = str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

                g = str(a.split("\\")[1])
                with open("D:/f/" + g + "&" + t + "&" + name + ".txt", mode="w") as newfile:
                    newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
                    print "yes"

            if time.time() > t_end:

                print "no"

                with open("D:/f/" + g + "&" + t + "&" + name + ".txt", mode="w") as newfile:
                    newfile.write("")

        except KeyboardInterrupt:
           raise

        except:
            for name in files:
                t = os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                a = str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

                g = str(a.split("\\")[1])
                with open("D:/f/" + g + "&" + t + "&" + name + ".txt", mode="w") as newfile:
                    newfile.write("")
4

1 回答 1

1

你有错误的方法。

您定义结束时间,如果当前时间戳低于结束时间戳(将始终为),则立即进入whileTrue循环。所以while进入了循环,你被困在转换功能上。

我建议使用signalPython 中已经包含的模块。n它允许您在几秒钟后退出功能。在这个 Stack Overflow 答案中可以看到一个基本的例子。

您的代码将是这样的:

return astring
import converter as c
import os
import timeit
import time
import threading
import thread

yourpath = 'D:/hh/'

for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        try:
            timer = threading.Timer(5.0, thread.interrupt_main)
            try:
                c.convert_pdf_to_txt(os.path.join(root, name))
            except KeyboardInterrupt:
                 print("no")

                 with open("D:/f/" + g + "&" + t + "&" + name + ".txt", mode="w") as newfile:
                     newfile.write("")
            else:
                timer.cancel()
                t = os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                a = str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
                g = str(a.split("\\")[1])

                print("yes")

                with open("D:/f/" + g + "&" + t + "&" + name + ".txt", mode="w") as newfile:
                    newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))

        except KeyboardInterrupt:
           raise

        except:
            for name in files:
                t = os.path.split(os.path.dirname(os.path.join(root, name)))[1]
                a = str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])

                g = str(a.split("\\")[1])
                with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
                    newfile.write("")

只为未来:四个空格缩进,没有太多空格;)

于 2016-11-22T14:47:33.103 回答