parsing - Parse list with minimal separators

Question

I have a language with statements of 4 kinds: s00, s01, s10, s11 where a leading 1 means initial keyword, a trailing 1 means terminated, and I have a separator ";". I can terminate any statement with ";". I would like to parse a language allowing a list of statements which allows minimal use of ";". The parser is Dypgen which is GLR+.

Example:

{ x=1 fun f(){} x=1; x=1 var x=1 var x=1; x=1 }

Is it possible to do this at all? If so, how? If not, why?

I believe it can't be done, mainly because I can't think of how to do it :) However it does seem context sensitive: the rule is you have to insert a ";" between A and B if A is not terminated and B is not initiated, ditto for B and C which means B is used twice.

However because the parser is GLR+ it is tempting to just use

(s00|s01|s10|s11}*

as the rule, and if it misparses throw in a ";" (which is an s11 no-op) to resolve the ambiguity. It would be nicer if the parser would report a syntax error though. Perhaps this could be done when merging alternate s productions. The real problem is when they overlap instead of merging: if this occurs a program parse could explode.

score 1 · Accepted Answer

我最近遇到了与顶级短语类似的问题，其中一些需要 ;;在前一个短语中终止，而另一些（以短语介绍关键字开头）则不需要。我通过将短语的句法类别一分为二解决了我的问题，并为表达这种行为的短语序列提供了良好的规则。但这导致了拆分语法的重复。

在您的情况下，它将类似于：

sequence:
  | (s00 | s10) sequence_closed
  | (s01 | s11) sequence_open
  | ε

sequence_closed:
  | s10 sequence_closed
  | s11 sequence_open
  | ';' sequence_open
  | ε

sequence_open:
  | s00 sequence_closed
  | s01 sequence_open
  | ε

如果你想允许多余的分隔符（你很可能想要），那就有点复杂了，但这就是想法。

parsing - Parse list with minimal separators

1 回答 1

Related