4

我有一个 Python 脚本,其中包含一个读取文件并做一些事情的大循环(我正在使用几个包,如 urllib2、httplib2 或 BeautifulSoup)。

它看起来像这样:

try:
    with open(fileName, 'r') as file :
        for i, line in enumerate(file):
            try:
                # a lot of code
                # ....
                # ....
            except urllib2.HTTPError:
                print "\n >>> HTTPError"
            # a lot of other exceptions
            # ....
            except (KeyboardInterrupt, SystemExit):
                print "Process manually stopped"
                raise
            except Exception, e:
                print(repr(e))
except (KeyboardInterrupt, SystemExit):
    print "Process manually stopped"
    # some stuff

问题是程序在我点击Ctrl+时停止,C但它没有被我的两个 KeyboardInterrupt 异常中的任何一个捕获,尽管我确信它当前处于循环中(因此至少在大 try/except 内)。

这怎么可能?起初我认为这是因为我正在使用的一个包没有正确处理异常(比如只使用“except:”),但如果是这样的话,我的脚本就不会停止。但是脚本确实停止了,它应该至少被我的两个人抓住,对吧?

我哪里错了?

提前致谢!

编辑:

通过在 try-except 之后添加一个finally:子句并在两个 try-except 块中打印回溯,它通常None在我点击Ctrl+时显示C,但我曾经设法得到这个(似乎它来自 urllib2,但我不知道是否这就是我无法捕捉键盘中断的原因):

回溯(最近一次通话最后):

File "/home/darcot/code/Crawler/crawler.py", line 294, in get_articles_from_file
  content = Extractor(extractor='ArticleExtractor', url=url).getText()
File "/usr/local/lib/python2.7/site-packages/boilerpipe/extract/__init__.py", line 36, in __init__
  connection  = urllib2.urlopen(request)
File "/usr/local/lib/python2.7/urllib2.py", line 126, in urlopen
  return _opener.open(url, data, timeout)
File "/usr/local/lib/python2.7/urllib2.py", line 391, in open
  response = self._open(req, data)
File "/usr/local/lib/python2.7/urllib2.py", line 409, in _open
  '_open', req)
File "/usr/local/lib/python2.7/urllib2.py", line 369, in _call_chain
  result = func(*args)
File "/usr/local/lib/python2.7/urllib2.py", line 1173, in http_open
  return self.do_open(httplib.HTTPConnection, req)
File "/usr/local/lib/python2.7/urllib2.py", line 1148, in do_open
  raise URLError(err)
URLError: <urlopen error [Errno 4] Interrupted system call>
4

2 回答 2

3

我已经在对这个问题的评论中建议,这个问题很可能是由问题中遗漏的代码部分引起的。但是,确切的代码不应该是相关的,因为KeyboardInterrupt当 Python 代码被 Ctrl-C 中断时,Python 通常应该抛出异常。

您在评论中提到您使用boilerpipePython 包。此 Python 包用于JPype创建与 Java 的语言绑定...我可以使用以下 Python 程序重现您的问题:

from boilerpipe.extract import Extractor
import time

try:
  for i in range(10):
    time.sleep(1)

except KeyboardInterrupt:
  print "Keyboard Interrupt Exception"

如果您使用 Ctrl-C 中断此程序,则不会引发异常。似乎程序立即终止,Python 解释器没有机会抛出异常。删除导入boilerpipe后,问题就消失了...

一个调试会话gdb表明,如果boilerpipe导入大量线程,则 Python 启动了大量线程:

gdb --args python boilerpipe_test.py
[...]
(gdb) run
Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffef62b700 (LWP 3840)]
[New Thread 0x7fffef52a700 (LWP 3841)]
[New Thread 0x7fffef429700 (LWP 3842)]
[New Thread 0x7fffef328700 (LWP 3843)]
[New Thread 0x7fffed99a700 (LWP 3844)]
[New Thread 0x7fffed899700 (LWP 3845)]
[New Thread 0x7fffed798700 (LWP 3846)]
[New Thread 0x7fffed697700 (LWP 3847)]
[New Thread 0x7fffed596700 (LWP 3848)]
[New Thread 0x7fffed495700 (LWP 3849)]
[New Thread 0x7fffed394700 (LWP 3850)]
[New Thread 0x7fffed293700 (LWP 3851)]
[New Thread 0x7fffed192700 (LWP 3852)]

gdbboilerpipe没有导入的会话:

gdb --args python boilerpipe_test.py
[...]
(gdb) r
Starting program: /home/fabian/Experimente/pykeyinterrupt/bin/python boilerpipe_test.py
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7529533 in __select_nocancel () from /usr/lib/libc.so.6
(gdb) signal 2
Continuing with signal SIGINT.
Keyboard Interrupt Exception
[Inferior 1 (process 3904) exited normally 

所以我假设你的 Ctrl-C 信号在不同的线程中得到处理,或者jpype做其他奇怪的事情会破坏 Ctrl-C 的处理。

编辑:作为一种可能的解决方法,您可以注册一个信号处理程序,SIGINT该处理程序在您按下 Ctrl-C 时捕获进程接收到的信号。即使boilerpipeJPype被导入,信号处理程序也会被触发。这样,当用户按下 Ctrl-C 时,您将收到通知,并且您将能够在程序的中心点处理该事件。如果您想在此处理程序中终止脚本,您可以终止该脚本。如果你不这样做,一旦信号处理函数返回,脚本将继续在它被中断的地方运行。请参见下面的示例:

from boilerpipe.extract import Extractor
import time
import signal
import sys

def interuppt_handler(signum, frame):
    print "Signal handler!!!"
    sys.exit(-2) #Terminate process here as catching the signal removes the close process behaviour of Ctrl-C

signal.signal(signal.SIGINT, interuppt_handler)

try:
    for i in range(10):
        time.sleep(1)
#    your_url = "http://www.zeit.de"
#    extractor = Extractor(extractor='ArticleExtractor', url=your_url)
except KeyboardInterrupt:
    print "Keyboard Interrupt Exception" 
于 2014-08-12T19:14:16.660 回答
0

当您的脚本在 try 块之外时,您最有可能发出 CTRL-C,因此没有捕获信号。

于 2014-08-12T17:12:02.230 回答