希望保罗·麦奎尔能发现这一点并救我……
我已经抓住了“正则表达式逆变器”示例脚本http://pyparsing.wikispaces.com/file/view/invRegex.py
我正在尝试破解对 python 命名组的支持,例如(?P<blob_key>[a-zA-Z0-9-_=]+)
我是 pyparsing 的新手,我意识到正则表达式解析器可能不是最好的学习方式(我只是想用结果做一些实际的事情)。
我已经编辑了解析器函数,如下所示:
def parser():
global _parser
if _parser is None:
lbrack = Literal("[")
rbrack = Literal("]")
lbrace = Literal("{")
rbrace = Literal("}")
lparen = Literal("(")
rparen = Literal(")")
pyspec = Literal("?P")
langle = Literal("<")
rangle = Literal(">")
reMacro = Combine("\\" + oneOf(list("dws")))
escapedChar = ~reMacro + Combine("\\" + oneOf(list(printables)))
reLiteralChar = "".join(c for c in printables if c not in r"\[]{}().*?+|")
reRange = Combine(lbrack + SkipTo(rbrack,ignore=escapedChar) + rbrack)
reLiteral = ( escapedChar | oneOf(list(reLiteralChar)) )
reDot = Literal(".")
repetition = (
( lbrace + Word(nums).setResultsName("count") + rbrace ) |
( lbrace + Word(nums).setResultsName("minCount")+","+ Word(nums).setResultsName("maxCount") + rbrace ) |
oneOf(list("*+?"))
)
reNamedGroup = Combine(lparen + pyspec + langle + SkipTo(rangle) + rangle
+ SkipTo(rparen, include=True) + rparen)
reNamedGroup.setParseAction(handleNamedGroup)
reRange.setParseAction(handleRange)
reLiteral.setParseAction(handleLiteral)
reMacro.setParseAction(handleMacro)
reDot.setParseAction(handleDot)
reTerm = ( reLiteral | reNamedGroup | reRange | reMacro | reDot )
reExpr = operatorPrecedence( reTerm,
[
(repetition, 1, opAssoc.LEFT, handleRepetition),
(None, 2, opAssoc.LEFT, handleSequence),
(Suppress('|'), 2, opAssoc.LEFT, handleAlternative),
]
)
_parser = reExpr
return _parser
当我对我的测试正则表达式运行它时,reNamedGroup
似乎正确地找到并处理了命名组(我坚持了一些登录SkipTo
和其他方法......)但同时它似乎根本不参与输出,我的handleNamedGroup
函数永远不会被调用。
日志输出如下所示:
invert(r'serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/')
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: handleLiteral: ['s']
DEBUG:root: handleLiteral: ['e']
DEBUG:root: handleLiteral: ['r']
DEBUG:root: handleLiteral: ['v']
DEBUG:root: handleLiteral: ['e']
DEBUG:root: handleLiteral: ['_']
DEBUG:root: handleLiteral: ['b']
DEBUG:root: handleLiteral: ['l']
DEBUG:root: handleLiteral: ['o']
DEBUG:root: handleLiteral: ['b']
DEBUG:root: handleLiteral: ['/']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: handleRange: ['[A-Z]']
DEBUG:root: handleRepetition: [[[ABCDEFGHIJKLMNOPQRSTUVWXYZ], '{', '2', '}']]
DEBUG:root: handleLiteral: ['/']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: handleSequence: [[Lit:s, Lit:e, Lit:r, Lit:v, Lit:e, Lit:_, Lit:b, Lit:l, Lit:o, Lit:b, Lit:/, <libs.exreg.exreg.GroupEmitter object at 0x34cfa30>, Lit:/]]
以 ** 为前缀的行是skipRes
从 ...返回的值,SkipTo
对我来说它看起来是正确的。我被难住的部分是为什么他们被忽略了。
我敏锐地意识到我只是在盲目地复制和粘贴东西......我试图密切复制有用的东西reRange
......但是范围有效,而我的类似位则没有。
我猜可能周围的括号在解析的某个后期阶段从输出中“隐藏”解析的命名组,但我不知道如何。