我已经编写了一个可以解决问题的生成器,但我想知道实现越位规则的最佳方法。
很快:越位规则在这种情况下意味着缩进被识别为句法元素。
这是伪代码中的越位规则,用于制作以可用形式捕获缩进的标记器,我不想通过语言限制答案:
token NEWLINE
matches r"\n\ *"
increase line count
pick up and store the indentation level
remember to also record the current level of parenthesis
procedure layout tokens
level = stack of indentation levels
push 0 to level
last_newline = none
per each token
if it is NEWLINE put it to last_newline and get next token
if last_newline contains something
extract new_level and parenthesis_count from last_newline
- if newline was inside parentheses, do nothing
- if new_level > level.top
push new_level to level
emit last_newline as INDENT token and clear last_newline
- if new_level == level.top
emit last_newline and clear last_newline
- otherwise
while new_level < level.top
pop from level
if new_level > level.top
freak out, indentation is broken.
emit last_newline as DEDENT token
clear last_newline
emit token
while level.top != 0
emit token as DEDENT token
pop from level
comments are ignored before they are getting into the layouter
layouter lies between a lexer and a parser
此布局器一次不会生成多个 NEWLINE,并且在出现缩进时不会生成 NEWLINE。因此解析规则仍然非常简单。我认为这很好,但请告知是否有更好的方法来完成它。
虽然使用了一段时间,但我注意到在 DEDENT 之后发出换行符可能会很好,这样你可以用 NEWLINE 分隔表达式,同时保持 INDENT DEDENT 作为表达式的预告片。