pyparsing - 使用 pyparsing 作为查询解析器

Question

我刚刚了解了出色的 pyparsing 模块，我想用它来制作查询解析器。

基本上我希望能够解析以下类型的表达式：

'b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)'

其中 b_coherent、symbol 和 nucleon 是数据库的关键字。

我仔细阅读了 pyparsing (searchparser.py) 附带的示例之一，我认为（我希望！）使我非常接近我的目标，但仍然有问题。

这是我的代码：

from pyparsing import *

logical_operator    = oneOf(['and','&','or','|'], caseless=True) 
not_operator        = oneOf(['not','^'], caseless=True) 
db_keyword          = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])

value = Word(alphanums+'_')
quote = Combine('"' + value + '"') | value

selection = db_keyword + arithmetic_operator + (value|quote)
selection = selection + ZeroOrMore(logical_operator+selection)

parenthesis = Forward()
parenthesis << ((selection + parenthesis) | selection)
parenthesis = Combine('(' + parenthesis + ')') | selection

grammar = parenthesis + lineEnd

res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)')

我有一些问题要完全理解 Forward 对象。也许这是我的解析器无法正常工作的原因之一。你知道我的语法有什么问题吗？

非常感谢你的帮助

埃里克

score 1 · Accepted Answer

您可以使用 Forward 在括号内手工制作自己的表达式嵌套等，但是 pyparsingoperatorPrecedence简化了整个过程。请参阅下面的原始代码的更新形式，并带有注释：

from pyparsing import *

# break these up so we can represent higher precedence for 'and' over 'or'
#~ logical_operator    = oneOf(['and','&','or','|'], caseless=True) 
not_operator        = oneOf(['not','^'], caseless=True) 
and_operator        = oneOf(['and','&'], caseless=True) 
or_operator         = oneOf(['or' ,'|'], caseless=True) 

# db_keyword is okay, but you might just want to use a general 'identifier' expression,
# you won't have to keep updating as you add other terms to your query language
db_keyword          = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
ident = Word(alphas+'_', alphanums+'_')

# these aren't really arithmetic operators, they are comparison operators
#~ arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])

# instead of generic 'value', define specific value types 
#~ value = Word(alphanums+'_')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))

# use pyparsing's QuotedString class for this, it gives you quote escaping, and
# automatically strips quotes from the parsed text
#~ quote = Combine('"' + value + '"') | value
quote = QuotedString('"')

# when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
literal_true = Keyword('true', caseless=True)
literal_false = Keyword('false', caseless=True)
boolean_literal = literal_true | literal_false

# in future, you can expand comparison_operand to be its own operatorPrecedence 
# term, so that you can do things like "nucleon != 1+2" - but this is fine for now
comparison_operand = quote | db_keyword | ident | float_ | integer
comparison_expr = Group(comparison_operand + comparison_operator + comparison_operand)

# all this business is taken of for you by operatorPrecedence
#~ selection = db_keyword + arithmetic_operator + (value|quote)
#~ selection = selection + ZeroOrMore(logical_operator+selection)
#~ parenthesis = Forward()
#~ parenthesis << ((selection + parenthesis) | selection)
#~ parenthesis = Combine('(' + parenthesis + ')') | selection
#~ grammar = parenthesis + lineEnd

boolean_expr = operatorPrecedence(comparison_expr | boolean_literal, 
    [
    (not_operator, 1, opAssoc.RIGHT),
    (and_operator, 2, opAssoc.LEFT),
    (or_operator,  2, opAssoc.LEFT),
    ])
grammar = boolean_expr

res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)', parseAll=True)

print res.asList()

印刷

[[['b_coherent', '==', '1_2'], 'or', [['symbol', '==', 2], 'and', ['nucleon', '!=', 3]]]]

从这里开始，我建议您研究如何采取下一步来创建可以实际评估的东西，查看pyparsing wiki中的 simpleBool.py 示例，了解使用.operatorPrecedence

我很高兴听到你喜欢 pyparsing，欢迎！

score 0 · Accepted Answer

稍后定义的表达式的前向声明 - 用于递归语法，例如代数中缀表示法。当表达式已知时，使用“<<”运算符将其分配给 Forward 变量。

pyparsing - 使用 pyparsing 作为查询解析器

2 回答 2

Related

Reference