2

希望保罗·麦奎尔能发现这一点并救我……

我已经抓住了“正则表达式逆变器”示例脚本http://pyparsing.wikispaces.com/file/view/invRegex.py

我正在尝试破解对 python 命名组的支持,例如(?P<blob_key>[a-zA-Z0-9-_=]+)

我是 pyparsing 的新手,我意识到正则表达式解析器可能不是最好的学习方式(我只是想用结果做一些实际的事情)。

我已经编辑了解析器函数,如下所示:

def parser():
    global _parser
    if _parser is None:
        lbrack = Literal("[")
        rbrack = Literal("]")
        lbrace = Literal("{")
        rbrace = Literal("}")
        lparen = Literal("(")
        rparen = Literal(")")
        pyspec = Literal("?P")
        langle = Literal("<")
        rangle = Literal(">")

        reMacro = Combine("\\" + oneOf(list("dws")))
        escapedChar = ~reMacro + Combine("\\" + oneOf(list(printables)))
        reLiteralChar = "".join(c for c in printables if c not in r"\[]{}().*?+|")

        reRange = Combine(lbrack + SkipTo(rbrack,ignore=escapedChar) + rbrack)
        reLiteral = ( escapedChar | oneOf(list(reLiteralChar)) )
        reDot = Literal(".")
        repetition = (
            ( lbrace + Word(nums).setResultsName("count") + rbrace ) |
            ( lbrace + Word(nums).setResultsName("minCount")+","+ Word(nums).setResultsName("maxCount") + rbrace ) |
            oneOf(list("*+?")) 
            )

        reNamedGroup = Combine(lparen + pyspec + langle + SkipTo(rangle) + rangle
                               + SkipTo(rparen, include=True) + rparen)

        reNamedGroup.setParseAction(handleNamedGroup)
        reRange.setParseAction(handleRange)
        reLiteral.setParseAction(handleLiteral)
        reMacro.setParseAction(handleMacro)
        reDot.setParseAction(handleDot)

        reTerm = ( reLiteral | reNamedGroup | reRange | reMacro | reDot )
        reExpr = operatorPrecedence( reTerm,
            [
            (repetition, 1, opAssoc.LEFT, handleRepetition),
            (None, 2, opAssoc.LEFT, handleSequence),
            (Suppress('|'), 2, opAssoc.LEFT, handleAlternative),
            ]
        )
        _parser = reExpr

    return _parser

当我对我的测试正则表达式运行它时,reNamedGroup似乎正确地找到并处理了命名组(我坚持了一些登录SkipTo和其他方法......)但同时它似乎根本不参与输出,我的handleNamedGroup函数永远不会被调用。

日志输出如下所示:

invert(r'serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/')
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: handleLiteral: ['s']
DEBUG:root: handleLiteral: ['e']
DEBUG:root: handleLiteral: ['r']
DEBUG:root: handleLiteral: ['v']
DEBUG:root: handleLiteral: ['e']
DEBUG:root: handleLiteral: ['_']
DEBUG:root: handleLiteral: ['b']
DEBUG:root: handleLiteral: ['l']
DEBUG:root: handleLiteral: ['o']
DEBUG:root: handleLiteral: ['b']
DEBUG:root: handleLiteral: ['/']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: handleRange: ['[A-Z]']
DEBUG:root: handleRepetition: [[[ABCDEFGHIJKLMNOPQRSTUVWXYZ], '{', '2', '}']]
DEBUG:root: handleLiteral: ['/']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: handleSequence: [[Lit:s, Lit:e, Lit:r, Lit:v, Lit:e, Lit:_, Lit:b, Lit:l, Lit:o, Lit:b, Lit:/, <libs.exreg.exreg.GroupEmitter object at 0x34cfa30>, Lit:/]]

以 ** 为前缀的行是skipRes从 ...返回的值,SkipTo对我来说它看起来是正确的。我被难住的部分是为什么他们被忽略了。

我敏锐地意识到我只是在盲目地复制和粘贴东西......我试图密切复制有用的东西reRange......但是范围有效,而我的类似位则没有。

我猜可能周围的括号在解析的某个后期阶段从输出中“隐藏”解析的命名组,但我不知道如何。

4

1 回答 1

1

您不想对 reNamedGroup 表达式中的括号做任何事情。请注意,括号中的 re 组没有其他定义的语法,但它们绝对有效。在此解析器中,括号作为 operatorPrecedence 表达式的一部分进行处理。刚刚将您对 reNamedGroup 的定义更改为:

reNamedGroup = pyspec + langle + SkipTo(rangle) + rangle

并让 operatorPrecedence 处理所有的paren分组。

[由 OP 编辑​​]
上述更改仅是一种工作,但命名组的所有输出都以任何一个开头,或者P?部分pyspec以某种方式泄漏到输出中。最后我不需要以堆栈形式重写(见评论),以下附加更改使其正常工作:

reTerm = ( reLiteral | reRange | reMacro | reDot )
reExpr = operatorPrecedence( reTerm,
    [
    (reNamedGroup.suppress(), 1, opAssoc.RIGHT, handleNamedGroup),
    (repetition, 1, opAssoc.LEFT, handleRepetition),
    (None, 2, opAssoc.LEFT, handleSequence),
    (Suppress('|'), 2, opAssoc.LEFT, handleAlternative),
    ]
) 
于 2012-06-21T21:20:20.010 回答