有点晚了,但谷歌搜索pyparsing reentrancy
显示了这个主题,所以我的回答。
通过将上下文附加到正在解析的字符串,我已经解决了解析器实例重用/重入的问题。您子类str
化,将您的上下文放入新 str 类的属性中,将其实例传递给pyparsing
并在操作中获取上下文。
蟒蛇 2.7:
from pyparsing import LineStart, LineEnd, Word, alphas, Optional, Regex, Keyword, OneOrMore
# subclass str; note that unicode is not handled
class SpecStr(str):
context = None # will be set in spec_string() below
# override as pyparsing calls str.expandtabs by default
def expandtabs(self, tabs=8):
ret = type(self)(super(SpecStr, self).expandtabs(tabs))
ret.context = self.context
return ret
# set context here rather than in the constructor
# to avoid messing with str.__new__ and super()
def spec_string(s, context):
ret = SpecStr(s)
ret.context = context
return ret
class Actor(object):
def __init__(self):
self.namespace = {}
def pair_parsed(self, instring, loc, tok):
self.namespace[tok.key] = tok.value
def include_parsed(self, instring, loc, tok):
# doc = open(tok.filename.strip()).read() # would use this line in real life
doc = included_doc # included_doc is defined below
parse(doc, self) # <<<<< recursion
def make_parser(actor_type):
def make_action(fun): # expects fun to be an unbound method of Actor
def action(instring, loc, tok):
if isinstance(instring, SpecStr):
return fun(instring.context, instring, loc, tok)
return None # None as a result of parse actions means
# the tokens has not been changed
return action
# Sample grammar: a sequence of lines,
# each line is either 'key=value' pair or '#include filename'
Ident = Word(alphas)
RestOfLine = Regex('.*')
Pair = (Ident('key') + '=' +
RestOfLine('value')).setParseAction(make_action(actor_type.pair_parsed))
Include = (Keyword('#include') +
RestOfLine('filename')).setParseAction(make_action(actor_type.include_parsed))
Line = (LineStart() + Optional(Pair | Include) + LineEnd())
Document = OneOrMore(Line)
return Document
Parser = make_parser(Actor)
def parse(instring, actor=None):
if actor is not None:
instring = spec_string(instring, actor)
return Parser.parseString(instring)
included_doc = 'parrot=dead'
main_doc = """\
#include included_doc
ham = None
spam = ham"""
# parsing without context is ok
print 'parsed data:', parse(main_doc)
actor = Actor()
parse(main_doc, actor)
print 'resulting namespace:', actor.namespace
产量
['#include', 'included_doc', '\n', 'ham', '=', 'None', '\n', 'spam', '=', 'ham']
{'ham': 'None', 'parrot': 'dead', 'spam': 'ham'}
这种方法使Parser
自身完全可重用和可重入。pyparsing
只要您不触摸ParserElement
的静态字段,内部通常也是可重入的。唯一的缺点是pyparsing
在每次调用 时都会重置其 Packrat 缓存parseString
,但这可以通过覆盖SpecStr.__hash__
(使其像object
, not一样可散列str
)和一些猴子补丁来解决。在我的数据集上,这根本不是问题,因为性能损失可以忽略不计,这甚至有利于内存使用。