python - 发生异常后在最旧的堆栈帧中启动 python 调试器

Question

我将该--pdb命令与 ipython 一起使用，因此当我调试代码并发生错误时，它会显示堆栈跟踪。许多这些错误来自调用带有错误输入的 numpy 或 pandas 函数。堆栈跟踪从这些库的代码中的最新帧开始。5-10 次重复该up命令之后，我实际上可以看到我做错了什么，这在 90% 的情况下会立即显现出来（例如，使用列表而不是数组调用）。

有什么方法可以指定调试器最初从哪个堆栈帧开始？最初运行的 python 文件中最旧的堆栈帧或最新的堆栈帧，或类似的。这对于调试来说会更有效率。

这是一个简单的例子

import pandas as pd

def test(df):  # (A)
    df[:,0] = 4 #Bad indexing on dataframe, will cause error
    return df

df = test(pd.DataFrame(range(3))) # (B)

为清楚起见，添加了生成的回溯，(A)、(B)、(C)

In [6]: ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-66730543fac0> in <module>()
----> 1 import codecs, os;__pyfile = codecs.open('''/tmp/py29142W1d''', encoding='''utf-8''');__code = __pyfile.read().encode('''utf-8''');__pyfile.close();os.remove('''/tmp/py29142W1d''');exec(compile(__code, '''/test/stack_frames.py''', 'exec'));

/test/stack_frames.py in <module>()
      6 
      7 if __name__ == '__main__':
(A)----> 8     df = test(pd.DataFrame(range(3)))

/test/stack_frames.py in test(df)
      2 
      3 def test(df):
(B)----> 4     df[:,0] = 4
      5     return df
      6 

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   2355         else:
   2356             # set column
-> 2357             self._set_item(key, value)
   2358 
   2359     def _setitem_slice(self, key, value):

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   2421 
   2422         self._ensure_valid_index(value)
-> 2423         value = self._sanitize_column(key, value)
   2424         NDFrame._set_item(self, key, value)
   2425 

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in _sanitize_column(self, key, value)
   2602 
   2603         # broadcast across multiple columns if necessary
-> 2604         if key in self.columns and value.ndim == 1:
   2605             if (not self.columns.is_unique or
   2606                     isinstance(self.columns, MultiIndex)):

/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.pyc in __contains__(self, key)
   1232 
   1233     def __contains__(self, key):
-> 1234         hash(key)
   1235         # work around some kind of odd cython bug
   1236         try:

TypeError: unhashable type
> /usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py(1234)__contains__()
   1232 
   1233     def __contains__(self, key):
(C)-> 1234         hash(key)
   1235         # work around some kind of odd cython bug
   1236         try:

ipdb>

现在理想情况下，我希望调试器从 (B) 处的第二旧帧开始，甚至在 (A) 处开始。但绝对不是默认情况下的 (C)。

score 3 · Accepted Answer

为自己记录过程的长答案。底部的半工作解决方案：

此处尝试失败：

import sys
import pdb
import pandas as pd

def test(df):  # (A)
    df[:,0] = 4 #Bad indexing on dataframe, will cause error
    return df

mypdb = pdb.Pdb(skip=['pandas.*'])
mypdb.reset()

df = test(pd.DataFrame(range(3))) # (B) # fails.

mypdb.interaction(None, sys.last_traceback)  # doesn't work.

Pdb 跳过文档：

如果给定，skip 参数必须是 glob 样式的模块名称模式的可迭代。调试器不会单步执行源自与这些模式之一匹配的模块的帧。

pdb源代码：

class Pdb(bdb.Bdb, cmd.Cmd):

    _previous_sigint_handler = None

    def __init__(self, completekey='tab', stdin=None, stdout=None, skip=None,
                 nosigint=False, readrc=True):
        bdb.Bdb.__init__(self, skip=skip)
        [...]

# Post-Mortem interface

def post_mortem(t=None):
    # handling the default
    if t is None:
        # sys.exc_info() returns (type, value, traceback) if an exception is
        # being handled, otherwise it returns None
        t = sys.exc_info()[2]
    if t is None:
        raise ValueError("A valid traceback must be passed if no "
                         "exception is being handled")

    p = Pdb()
    p.reset()
    p.interaction(None, t)

def pm():
    post_mortem(sys.last_traceback)

bdb源代码：

class Bdb:
    """Generic Python debugger base class.
    This class takes care of details of the trace facility;
    a derived class should implement user interaction.
    The standard debugger class (pdb.Pdb) is an example.
    """

    def __init__(self, skip=None):
        self.skip = set(skip) if skip else None
    [...]
    def is_skipped_module(self, module_name):
        for pattern in self.skip:
            if fnmatch.fnmatch(module_name, pattern):
                return True
        return False

    def stop_here(self, frame):
        # (CT) stopframe may now also be None, see dispatch_call.
        # (CT) the former test for None is therefore removed from here.
        if self.skip and \
               self.is_skipped_module(frame.f_globals.get('__name__')):
            return False
        if frame is self.stopframe:
            if self.stoplineno == -1:
                return False
            return frame.f_lineno >= self.stoplineno
        if not self.stopframe:
            return True
        return False

很明显，跳过列表不用于事后分析。为了解决这个问题，我创建了一个覆盖 setup 方法的自定义类。

import pdb

class SkipPdb(pdb.Pdb):
    def setup(self, f, tb):
        # This is unchanged
        self.forget()
        self.stack, self.curindex = self.get_stack(f, tb)
        while tb:
            # when setting up post-mortem debugging with a traceback, save all
            # the original line numbers to be displayed along the current line
            # numbers (which can be different, e.g. due to finally clauses)
            lineno = pdb.lasti2lineno(tb.tb_frame.f_code, tb.tb_lasti)
            self.tb_lineno[tb.tb_frame] = lineno
            tb = tb.tb_next

        self.curframe = self.stack[self.curindex][0]
        # This loop is new
        while self.is_skipped_module(self.curframe.f_globals.get('__name__')):
            self.curindex -= 1
            self.stack.pop()
            self.curframe = self.stack[self.curindex][0]
        # The rest is unchanged.
        # The f_locals dictionary is updated from the actual frame
        # locals whenever the .f_locals accessor is called, so we
        # cache it here to ensure that modifications are not overwritten.
        self.curframe_locals = self.curframe.f_locals
        return self.execRcLines()

    def pm(self):
        self.reset()
        self.interaction(None, sys.last_traceback)

如果您将其用作：

x = 42
df = test(pd.DataFrame(range(3))) # (B) # fails.
# fails. Then do:
mypdb = SkipPdb(skip=['pandas.*'])
mypdb.pm()
>> <ipython-input-36-e420cf1b80b2>(2)<module>()
>-> df = test(pd.DataFrame(range(3))) # (B) # fails.
> (Pdb) l
>  1    x = 42
>  2  ->    df = test(pd.DataFrame(range(3))) # (B) # fails.
> [EOF]

你被扔进了正确的框架。现在你只需要弄清楚 ipython 如何调用他们的 pdb pm/post_mortem 函数，并创建一个类似的脚本。这似乎很难，所以我几乎放弃了这里。

这也不是一个很好的实现。它假定您要跳过的帧位于堆栈的顶部，否则会产生奇怪的结果。例如，df.apply 的输入函数中的错误会产生非常奇怪的东西。

TLDR：stdlib 不支持，但您可以创建自己的调试器类，但使用 IPythons 调试器进行工作并非易事。

python - 发生异常后在最旧的堆栈帧中启动 python 调试器

1 回答 1

Related

Reference