3

I'am writing some code to parse commands from the Simple Imperative Language defined in Theory of Programming Languages (Reynolds, 1998).

I have a lexer module that given a string extracts the tokens from it if it's a valid language expression and then I pass that list of tokens to the parser which should build an internal representation of the command (defined as an algebraic data type).

These are my Tokens:

--Tokens for the parser
data Token = Kw Keyword
| Num Int
| Op Operator
| Str String
| Sym Symbol
deriving Show 

I'm having trouble with binary operators. I'll put as an example the sum, but it happens the same with all of them, either boolean or integers.

For example if I'd run the program parse "x:=2+3" I should get the following list of tokens from the lexer
[Str "x", Op Colon, Op Equal, Num 2, OP, Plus, Num 3]
which is actually what I'm getting.

But then the parser should return the command
Assign "x" (Ibin Plus (Const 2) (Const 3)
which is the correct representation of the command. But instead of that I'm getting the following representation:
Assign "x" (Const 2)

I guess that I screwed it at some point in the pIntExpr function because the variable identifier and the := of the assignment are parsed OK and it's not parsing the last elements. Here are the relevant parsers for this example, to see if someone can orientate me in what I'm doing wrong.

-- Integer expressions
data IntExpr = Const Int 
         | Var Iden --Iden=String
         | Neg IntExpr
         | IBin OpInt IntExpr IntExpr
         deriving Show

type TParser = Parsec [Token] ()

--Internal representation of the commands
data Comm = Skip 
      | Assign Iden IntExpr
      | If Assert Comm Comm
      | Seq Comm Comm
      | While Assert Comm
      | Newvar Iden IntExpr Comm
        deriving Show

--Parser for non sequential commands
pComm' :: TParser Comm
pComm' = choice [pif,pskip,pAssign,pwhile,pNewVar]

--Parser for the assignment command
pAssign :: TParser Comm
pAssign = do v <- pvar
             _ <- silentOp Colon
             _ <- silentOp Equal
             e <- pIntExp
             return $ Assign v e 

-- Integer expressions parser
pIntExp :: TParser IntExpr
pIntExp = choice [ var'  --An intexp is either a variable
                  , num  --Or a numeric constant
                  , pMul --Or <intexp>x<intexp>
                  , pSum --Or <intexp>+<intexp>
                  , pRes --Or <intexp>-<intexp>
                  , pDiv --Division
                  , pMod --Modulus
                  , pNeg --Unary "-"
                 ]

-- Parser for <intexp>+<intexp>
pSum :: TParser IntExpr
pSum = do 
    e <- pIntExp
    _ <- silentOp Lexer.Plus
    e' <- pIntExp
    return $ IBin Lang.Plus e e'

UPDATE TAKING INTO ACCOUNT AndrewC's ANSWER
Unfortunately moving the var' parser down in the choice list didn't work, it yields the same result. But I took AndrewC's answer into account and tried to "manually" trace the execution (I'm not familiar with ghci's debugger and ended up doing lot of single steps and got lost eventually). This is how I reason it:

I got this token list from the lexer: [Str "x", Op Colon, Op Equal, Num 2, OP Plus, Num 3]

So, the pComm' parser fails with pif and pskip, but succeds with pAssign, consuming Str "x", Op Colon and Op Equal and trying to parse [Num 2, OP Plus, Num 3] with pIntExp (!!)

The pIntExp parser then tries the var' parser and fails, but succeds with the num parser consuming the Num 2 token and therefore returning the erroneous result Assign "x" (Const 2).

So with AndrewC's advice in mind about choice, I moved num parser down in the list too. For the sake of simplicity I'll consider pIntExp as choice [pSum, num, var´] that it's what's relevant for this particular example.

The first part of the reasoning remains the same. So I'll restart from (!!) where we had
[Num 2, Op Plus, Num 3] to be parsed by pIntExp
pIntExp tries now first with pSum, which in turn "calls" pIntExp again, which will try pSum again, and so the program hangs. I tried it and it indeed hangs and never ends.

So I was wondering if there's a form to make the pSum parser "lookahead" for the Op Plus token and then parse the corresponding expressions?

UPDATE 2: After "googling" a little bit more now that I've identified the problem I found that the combinational parsers chainl1 and/or chainl might be just what I need. I'll be playing with these and if I work it out post the solution

4

1 回答 1

4

选择函数按照它们在列表中的顺序尝试它给出的解析器。

由于您的变量解析器出现在您的解析器之前,用于更复杂的加法表达式,因此它在尝试另一个之前成功。

要解决这个问题,请将变量解析器放在任何以变量开头的表达式之后(并在使用choice.xml 时考虑任何其他子字符串匹配问题)。

类似的问题包括 3 - 4 + 1 评估为 -2。人们期望在没有其他优先级的情况下左关联(所以总和 - 术语而不是术语 - 总和)。

您可能也不希望 1 + 10 * 5 计算为 55,因此如果要实现运算符优先级,则必须小心 + 和 * 等。您可以通过将由乘法组成的表达式解析为项,然后将加法表达式解析为项的总和来实现此目的。

于 2013-04-29T06:36:27.413 回答