17

我已经尝试使用此代码并将其转换为我正在从事的用于编程语言处理的项目,但我遇到了简化版本的问题:

op = oneOf( '+ - / *')
lparen, rparen = Literal('('), Literal(')')

expr = Forward()
expr << ( Word(nums) | ( expr + op + expr ) | ( lparen + expr + rparen) )

我已经对这个简单的设置进行了许多不同的修改。通常,尝试类似:

print(expr.parseString('1+2'))

将返回['1']。当我陷入深度递归时,例如:

print(expr.parseString('(1+2)'))

关于我无法解析任意算术表达式的简单递归,我缺少什么,例如1+(2 * 3-(4*(5+6)-(7))...

4

4 回答 4

27

哇,我猜 pyparsing 真的在地图上!感谢 Alex 和 John 介入这个问题。你们的回答都对标。但让我添加一两条评论:

  1. 如果我们抑制左括号和右括号符号,并使用 Group 对带括号的表达式进行分组,pyparsing 将得到更接近 AST 的结构化结果。

    from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf,Group
    
    def Syntax():
        op = oneOf('+ -')
        lpar  = Literal( '(' ).suppress()
        rpar  = Literal( ')' ).suppress()
        num = Word(nums)
        expr = Forward()
        atom = num | Group(lpar + expr + rpar)
        expr << atom + ZeroOrMore(op + atom)
        return expr
    
    if __name__ == "__main__":
        expr = Syntax()
        def test(s):
            results = expr.parseString(s)
            print s,'->', results
    
        test( "(9 + 3)" )
        test( "(9 + 3) * (4 / 5)" )
    

    给予:

    (9 + 3) -> [['9', '+', '3']]
    (9 + 3) * (4 / 5) -> [['9', '+', '3'], '*', ['4', '/', '5']]
    

    否则,pyparsing 只是标记化,您必须遍历已解析标记的列表才能找到嵌套表达式。

  2. 由于 op 仅定义为 oneOf("+ - * /"),因此没有操作的优先级。在https://github.com/pyparsing/pyparsing/tree/master/examples上的 pyparsing 存储库中有手动定义此方法 (fourFn.py) 或使用infixNotation帮助器 (simpleArith.py ) 的更新方法的示例)。同样,这让 pyparsing 增加了比标记化更多的价值。

对于 OP,请查看这些示例,我认为它们将帮助您推进您的项目。

——保罗

于 2009-08-29T03:02:04.717 回答
9

Is this more or less what you want...?

from pyparsing import Literal,Word,ZeroOrMore,Forward,nums,oneOf

def Syntax():
    op = oneOf( '+ - / *')
    lpar  = Literal( '(' )
    rpar  = Literal( ')' )
    num = Word(nums)

    expr = Forward()
    atom = num | ( lpar + expr + rpar )
    expr << atom + ZeroOrMore( op + expr )
    return expr


if __name__ == "__main__":

    expr = Syntax()

    def test(s):
        results = expr.parseString( s )
        print s,'->', results

    test( "(9 + 3)" )
    test( "(9 + 3) * (4 / 5)" )

emitting

(9 + 3) -> ['(', '9', '+', '3', ')']
(9 + 3) * (4 / 5) -> ['(', '9', '+', '3', ')', '*', '(', '4', '/', '5', ')']

? This "anchors" the recursion by separating an "atom" (number or parenthesized expression) from an "expression" (one or more "atoms" with operators in-between).

于 2009-08-28T05:19:59.487 回答
4

A grammar like:

expr :: expr op expr

is hard to work with because the recursion just keeps diving into the left.

A normal arithmetic grammar would look something like:

expr :: mulxp | mulxp '+' expr
mulxp :: atom | atom '*' expr
atom :: Word(nums) | '(' + expr + ')'

Basically, you never get S :: S; any time a nonterminal appears on the left and right hand sides of a line in the grammar, there must be some literal in the middle for the parser to consume.

于 2009-08-28T05:19:10.440 回答
0

用于operatorPrecedence构建表达式。它将构建正确的表达式,并在处理时处理运算符优先级:

num = Word(nums)
plusop = oneOf( '+ -')
multop = oneOf('/ *')
expr = operatorPrecedence(num,
                          [(multop, 2, opAssoc.LEFT),(plusop, 2, opAssoc.LEFT)])

例子:

>> print parsetime.expr.parseString("1+(2 * 3-(4*(5+6)-(7)))")
[['1', '+', [['2', '*', '3'], '-', [['4', '*', ['5', '+', '6']], '-', '7']]]]
于 2014-02-24T18:38:33.813 回答