3

尽管阅读了许多关于该主题的文章(包括 [this][1] 非常流行的关于 SO 的文章),但我很难很好地掌握装饰器。我怀疑我一定是愚蠢的,但是由于愚蠢带来的所有固执,我决定尝试解决这个问题。

那,我怀疑我有一个很好的用例......

下面是我的一个项目中的一些代码,它从 PDF 文件中提取文本。处理涉及三个步骤:

  1. 设置处理 PDF 文件所需的 PDFMiner 对象(样板初始化)。
  2. 对 PDF 文件应用处理功能。
  3. 不管发生什么,关闭文件。

我最近了解了上下文管理器和with语句,这对他们来说似乎是一个很好的用例。因此,我从定义PDFMinerWrapper类开始:

class PDFMinerWrapper(object):
    '''
    Usage:
    with PDFWrapper('/path/to/file.pdf') as doc:
        doc.dosomething()
    '''
    def __init__(self, pdf_doc, pdf_pwd=''):
        self.pdf_doc = pdf_doc
        self.pdf_pwd = pdf_pwd

    def __enter__(self):
        self.pdf = open(self.pdf_doc, 'rb')
        parser = PDFParser(self.pdf)  # create a parser object associated with the file object
        doc = PDFDocument()  # create a PDFDocument object that stores the document structure
        parser.set_document(doc)  # connect the parser and document objects
        doc.set_parser(parser)
        doc.initialize(self.pdf_pwd)  # pass '' if no password required
        return doc

    def __exit__(self, type, value, traceback):
        self.pdf.close()
        # if we have an error, catch it, log it, and return the info
        if isinstance(value, Exception):
            self.logError()
            print traceback
            return value

现在我可以轻松地处理 PDF 文件,并确保它能够优雅地处理错误。理论上,我需要做的就是这样的:

with PDFMinerWrapper('/path/to/pdf') as doc:
    foo(doc)

这很好,除了我需要在将函数应用于返回的对象之前检查 PDF 文档是否可提取PDFMinerWrapper。我目前的解决方案涉及一个中间步骤。

我正在使用我调用的一个类,该类Pamplemousse用作处理 PDF 的接口。反过来,PDFMinerWrapper每次必须对对象已链接到的文件执行操作时,它都会使用它。

下面是一些(删节的)代码,演示了它的使用:

class Pamplemousse(object):
    def __init__(self, inputfile, passwd='', enc='utf-8'):
        self.pdf_doc = inputfile
        self.passwd = passwd
        self.enc = enc

    def with_pdf(self, fn, *args):
        result = None
        with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
            if doc.is_extractable:  # This is the test I need to perform
                # apply function and return result
                result = fn(doc, *args)

        return result

    def _parse_toc(self, doc):
        toc = []
        try:
            toc = [(level, title) for level, title, dest, a, se in doc.get_outlines()]
        except PDFNoOutlines:
            pass
        return toc

    def get_toc(self):
        return self.with_pdf(self._parse_toc)

每当我希望对 PDF 文件执行操作时,我都会将相关函数with_pdf及其参数传递给该方法。with_pdf反过来,该方法使用该with语句来利用上下文管理器PDFMinerWrapper(从而确保优雅地处理异常)并在实际应用已传递的函数之前执行检查。

我的问题如下:

我想简化这段代码,这样我就不必显式调用Pamplemousse.with_pdf. 我的理解是装饰器在这里可能会有所帮助,所以:

  1. 我将如何实现一个装饰器,其工作是调用with语句并执行可提取性检查?
  2. 装饰器是否可以是类方法,或者我的装饰器必须是自由形式的函数或类?
4

4 回答 4

1

我解释您的目标的方式是能够在您的Pamplemousse类上定义多个方法,而不必经常将它们包装在该调用中。这是它可能的真正简化版本:

def if_extractable(fn):
    # this expects to be wrapping a Pamplemousse object
    def wrapped(self, *args):
        print "wrapper(): Calling %s with" % fn, args
        result = None
        with PDFMinerWrapper(self.pdf_doc) as doc:
            if doc.is_extractable:
                result = fn(self, doc, *args)
        return result
    return wrapped


class Pamplemousse(object):

    def __init__(self, inputfile):
        self.pdf_doc = inputfile

    # get_toc will only get called if the wrapper check
    # passes the extractable test
    @if_extractable
    def get_toc(self, doc, *args):
        print "get_toc():", self, doc, args

定义的装饰器if_extractable只是一个函数,但它希望用于您的类的实例方法。

曾经委托给私有方法的 decorated get_toc,如果它通过了检查,它只会期望接收一个doc对象和 args。否则它不会被调用并且包装器返回 None。

有了这个,你可以继续定义你的操作函数来期待一个doc

你甚至可以添加一些类型检查来确保它包装了预期的类:

def if_extractable(fn):
    def wrapped(self, *args):
    if not hasattr(self, 'pdf_doc'):
        raise TypeError('if_extractable() is wrapping '\
                        'a non-Pamplemousse object')
    ...
于 2012-07-26T00:30:54.233 回答
0

装饰器只是一个接受一个函数并返回另一个函数的函数。你可以做任何你喜欢的事情:

def my_func():
    return 'banana'

def my_decorator(f): # see it takes a function as an argument
    def wrapped():
        res = None
        with PDFMineWrapper(pdf_doc, passwd) as doc:
            res = f()
        return res
     return wrapper # see, I return a function that also calls f

现在,如果您应用装饰器:

@my_decorator
def my_func():
    return 'banana'

wrapped函数将替换my_func,因此将调用额外的代码。

于 2012-07-25T23:31:29.727 回答
0

您可能想按照以下方式尝试:

def with_pdf(self, fn, *args):
    def wrappedfunc(*args):
        result = None
        with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
            if doc.is_extractable:  # This is the test I need to perform
                # apply function and return result
                result = fn(doc, *args)
        return result
    return wrappedfunc

当您需要包装函数时,只需执行以下操作:

@pamplemousseinstance.with_pdf
def foo(doc, *args):
    print 'I am doing stuff with', doc
    print 'I also got some good args. Take a look!', args
于 2012-07-25T23:35:19.663 回答
0

下面是一些演示代码:

#! /usr/bin/python

class Doc(object):
    """Dummy PDFParser Object"""

    is_extractable = True
    text = ''

class PDFMinerWrapper(object):
    '''
    Usage:
    with PDFWrapper('/path/to/file.pdf') as doc:
        doc.dosomething()
    '''
    def __init__(self, pdf_doc, pdf_pwd=''):
        self.pdf_doc = pdf_doc
        self.pdf_pwd = pdf_pwd

    def __enter__(self):
        return self.pdf_doc

    def __exit__(self, type, value, traceback):
        pass

def safe_with_pdf(fn):
    """
    This is the decorator, it gets passed the fn we want
    to decorate.

    However as it is also a class method it also get passed
    the class. This appears as the first argument and the
    function as the second argument.
    """
    print "---- Decorator ----"
    print "safe_with_pdf: First arg (fn):", fn
    def wrapper(self, *args, **kargs):
        """
        This will get passed the functions arguments and kargs,
        which means that we can intercept them here.
        """
        print "--- We are now in the wrapper ---"
        print "wrapper: First arg (self):", self
        print "wrapper: Other args (*args):", args
        print "wrapper: Other kargs (**kargs):", kargs

        # This function is accessible because this function is
        # a closure, thus still has access to the decorators
        # ivars.
        print "wrapper: The function we run (fn):", fn

        # This wrapper is now pretending to be the original function

        # Perform all the checks and stuff
        with PDFMinerWrapper(self.pdf, self.passwd) as doc:
            if doc.is_extractable:
                # Now call the orininal function with its
                # argument and pass it the doc
                result = fn(doc, *args, **kargs)
            else:
                result = None
        print "--- End of the Wrapper ---"
        return result

    # Decorators are expected to return a function, this
    # function is then run instead of the decorated function.
    # So instead of returning the original function we return the
    # wrapper. The wrapper will be run with the original functions
    # argument.

    # Now by using closures we can still access the original
    # functions by looking up fn (the argument that was passed
    # to this function) inside of the wrapper.
    print "--- Decorator ---"
    return wrapper


class SomeKlass(object):

    @safe_with_pdf
    def pdf_thing(doc, some_argument):
        print ''
        print "-- The Function --"

        # This function is now passed the doc from the wrapper.

        print 'The contents of the pdf:', doc.text
        print 'some_argument', some_argument
        print "-- End of the Function --"
        print ''

doc = Doc()
doc.text = 'PDF contents'
klass = SomeKlass()  
klass.pdf = doc
klass.passwd = ''
klass.pdf_thing('arg')

我建议运行该代码以查看它是如何工作的。需要注意的一些有趣的点:

首先,您会注意到我们只传递了一个参数,pdf_thing()但是如果您查看该方法,它需要两个参数:

@safe_with_pdf
def pdf_thing(doc, some_argument):
    print ''
    print "-- The Function --"

这是因为如果您查看我们所有功能的包装器:

with PDFMinerWrapper(self.pdf, self.passwd) as doc:
    if doc.is_extractable:
        # Now call the orininal function with its
        # argument and pass it the doc
        result = fn(doc, *args, **kargs)

我们生成 doc 参数并将其与原始参数 ( *args, **kargs) 一起传入。这意味着使用此装饰器包装的每个方法或函数除了在其声明 ( )doc中列出的参数外,还会收到一个附加参数。def pdf_thing(doc, some_argument):

另一件需要注意的是包装器:

def wrapper(self, *args, **kargs):
    """
    This will get passed the functions arguments and kargs,
    which means that we can intercept them here.
    """

还捕获self参数并且不将其传递给被调用的方法。您可以通过以下方式修改函数调用来更改此行为:

result = fn(doc, *args, **kargs)
    else:
        result = None

到:

result = fn(self, doc, *args, **kargs)
    else:
        result = None

然后将方法本身更改为:

def pdf_thing(self, doc, some_argument):

希望对您有所帮助,请随时要求更多说明。

编辑:

回答你问题的第二部分。

是的可以是一个类方法。只需放在safe_with_pdf上面SomeKlass 调用它,例如类中的第一个方法。

这里也是上述代码的简化版本,在类中有装饰器。

class SomeKlass(object):
    def safe_with_pdf(fn):
        """The decorator which will wrap the method"""
        def wrapper(self, *args, **kargs):
            """The wrapper which will call the method is a doc"""
            with PDFMinerWrapper(self.pdf, self.passwd) as doc:
                if doc.is_extractable:
                    result = fn(doc, *args, **kargs)
                else:
                    result = None
            return result
        return wrapper

    @safe_with_pdf
    def pdf_thing(doc, some_argument):
        """The method to decorate"""
        print 'The contents of the pdf:', doc.text
        print 'some_argument', some_argument
        return '%s - Result' % doc.text

print klass.pdf_thing('arg')
于 2012-07-26T00:06:19.720 回答