给定带有重复BLOCK
s 的输入,其中每个块都有重复BEGIN EVENT
和END EVENT
条目(END EVENT
总是跟在 a之后BEGIN EVENT
):
[TIMESTAMP] BLOCK
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
...
[TIMESTAMP] BLOCK
你如何用 LR(1) 来消除这个语法的歧义?我正在使用LALRPOP,最小的例子是:
Timestamp = "[TIMESTAMP]";
BlockHeader = Timestamp "BLOCK";
Begin = Timestamp "BEGIN" "EVENT";
End = Timestamp "END" "EVENT";
Block = BlockHeader (Begin End)+;
pub Blocks = Block*
因为 LR(1) 只能向前看一个标记,所以这个语法是模棱两可的,因为 LALRPOP 有用地告诉你(部分错误):
Local ambiguity detected
The problem arises after having observed the following symbols in the input:
BlockHeader (Begin End)+
At that point, if the next token is a `"[TIMESTAMP]"`, then the parser can proceed in two different ways.
First, the parser could execute the production at
/home/<snip>.lalrpop:51:9: 51:32, which would consume
the top 2 token(s) from the stack and produce a `Block`. This might then yield a parse tree like
BlockHeader (Begin End)+ Block
├─Block────────────────┤ │
├─Block+───────────────┘ │
└─Block+─────────────────────┘
Alternatively, the parser could shift the `"[TIMESTAMP]"` token and later use it to construct a
`Timestamp`. This might then yield a parse tree like
(Begin End)+ "[TIMESTAMP]" "BEGIN" "EVENT" End
│ ├─Timestamp─┘ │ │
│ └─Begin─────────────────────┘ │
└─(Begin End)+───────────────────────────────┘
我看到它告诉我,在解析 BlockHeader、Begin 和 End 之后,它无法确定下一个标记是另一个 Begin 还是另一个 Block 的开始。我还没有找到在 LR(1) 中消除歧义的方法,但我只能假设这是我缺乏理解,而不是 LR(1) 语法的继承限制?