1

我刚刚了解了出色的 pyparsing 模块,我想用它来制作查询解析器。

基本上我希望能够解析以下类型的表达式:

'b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)'

其中 b_coherent、symbol 和 nucleon 是数据库的关键字。

我仔细阅读了 pyparsing (searchparser.py) 附带的示例之一,我认为(我希望!)使我非常接近我的目标,但仍然有问题。

这是我的代码:

from pyparsing import *

logical_operator    = oneOf(['and','&','or','|'], caseless=True) 
not_operator        = oneOf(['not','^'], caseless=True) 
db_keyword          = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])

value = Word(alphanums+'_')
quote = Combine('"' + value + '"') | value

selection = db_keyword + arithmetic_operator + (value|quote)
selection = selection + ZeroOrMore(logical_operator+selection)

parenthesis = Forward()
parenthesis << ((selection + parenthesis) | selection)
parenthesis = Combine('(' + parenthesis + ')') | selection

grammar = parenthesis + lineEnd

res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)')

我有一些问题要完全理解 Forward 对象。也许这是我的解析器无法正常工作的原因之一。你知道我的语法有什么问题吗?

非常感谢你的帮助

埃里克

4

2 回答 2

1

您可以使用 Forward 在括号内手工制作自己的表达式嵌套等,但是 pyparsingoperatorPrecedence简化了整个过程。请参阅下面的原始代码的更新形式,并带有注释:

from pyparsing import *

# break these up so we can represent higher precedence for 'and' over 'or'
#~ logical_operator    = oneOf(['and','&','or','|'], caseless=True) 
not_operator        = oneOf(['not','^'], caseless=True) 
and_operator        = oneOf(['and','&'], caseless=True) 
or_operator         = oneOf(['or' ,'|'], caseless=True) 

# db_keyword is okay, but you might just want to use a general 'identifier' expression,
# you won't have to keep updating as you add other terms to your query language
db_keyword          = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
ident = Word(alphas+'_', alphanums+'_')

# these aren't really arithmetic operators, they are comparison operators
#~ arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])

# instead of generic 'value', define specific value types 
#~ value = Word(alphanums+'_')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))

# use pyparsing's QuotedString class for this, it gives you quote escaping, and
# automatically strips quotes from the parsed text
#~ quote = Combine('"' + value + '"') | value
quote = QuotedString('"')

# when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
literal_true = Keyword('true', caseless=True)
literal_false = Keyword('false', caseless=True)
boolean_literal = literal_true | literal_false

# in future, you can expand comparison_operand to be its own operatorPrecedence 
# term, so that you can do things like "nucleon != 1+2" - but this is fine for now
comparison_operand = quote | db_keyword | ident | float_ | integer
comparison_expr = Group(comparison_operand + comparison_operator + comparison_operand)

# all this business is taken of for you by operatorPrecedence
#~ selection = db_keyword + arithmetic_operator + (value|quote)
#~ selection = selection + ZeroOrMore(logical_operator+selection)
#~ parenthesis = Forward()
#~ parenthesis << ((selection + parenthesis) | selection)
#~ parenthesis = Combine('(' + parenthesis + ')') | selection
#~ grammar = parenthesis + lineEnd

boolean_expr = operatorPrecedence(comparison_expr | boolean_literal, 
    [
    (not_operator, 1, opAssoc.RIGHT),
    (and_operator, 2, opAssoc.LEFT),
    (or_operator,  2, opAssoc.LEFT),
    ])
grammar = boolean_expr

res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)', parseAll=True)

print res.asList()

印刷

[[['b_coherent', '==', '1_2'], 'or', [['symbol', '==', 2], 'and', ['nucleon', '!=', 3]]]]

从这里开始,我建议您研究如何采取下一步来创建可以实际评估的东西,查看pyparsing wiki中的simpleBool.py 示例,了解使用.operatorPrecedence

我很高兴听到你喜欢 pyparsing,欢迎!

于 2012-11-01T09:57:20.730 回答
0

稍后定义的表达式的前向声明 - 用于递归语法,例如代数中缀表示法。当表达式已知时,使用“<<”运算符将其分配给 Forward 变量。

于 2012-10-31T16:21:40.940 回答