yacc - 关于编译器构造的疑问（Flex/Bison）

Question

我试图在我的课堂上构建一个简单的编译器，它的第二周并且我完全停留在这些点上：我提供我的 simple.l as（flex 和 bison 文件被剪断以节省空间）：

..snip..
end     {return(END);}
skip        {return(SKIP);}
in      {return(IN);}
integer {return(INTEGER);}
let     {return(LET);}
..snip..
[ \t\n\r]+

和 simple.y 为：

%start program
%token LET IN END 
%token SKIP IF THEN ELSE WHILE DO READ WRITE FI ASSGNOP
%token NUMBER PERIOD COMMA SEMICOLON INTEGER
%token IDENTIFIER EWHILE LT
%left '-' '+'
%left '*' '/'
%right '^'
%%

program : LET declarations IN commands END SEMICOLON
declarations :  
|INTEGER id_seq IDENTIFIER PERIOD
;
id_seq:
|id_seq IDENTIFIER COMMA
;
commands : 
| commands command SEMICOLON
;
command : SKIP
;
exp : NUMBER
| IDENTIFIER
| '('exp')'
;
..snip..
%%

我的第一个问题是当我编译并执行它时，它正确地接受我的输入直到最后，但它并没有在最后停止；即它再次进入开始状态，它是否应该在遇到结束时终止：

在输入：

let
integer x.
in
skip;
end;

这是一个输出：

Starting parse
Entering state 0
Reading a token: let
Next token is token LET ()
Shifting token LET ()
Entering state 1
Reading a token: integer x.
Next token is token INTEGER ()
Shifting token INTEGER ()
Entering state 3
Reducing stack by rule 4 (line 22):
-> $$ = nterm id_seq ()
Stack now 0 1 3
Entering state 6
Reading a token: Next token is token IDENTIFIER ()
Shifting token IDENTIFIER ()
Entering state 8
Reading a token: Next token is token PERIOD ()
Shifting token PERIOD ()
Entering state 10
Reducing stack by rule 3 (line 20):
   $1 = token INTEGER ()
   $2 = nterm id_seq ()
   $3 = token IDENTIFIER ()
   $4 = token PERIOD ()
-> $$ = nterm declarations ()
Stack now 0 1
Entering state 4
Reading a token: in
Next token is token IN ()
Shifting token IN ()
Entering state 7
Reducing stack by rule 6 (line 25):
-> $$ = nterm commands ()
Stack now 0 1 4 7
Entering state 9
Reading a token: skip;
Next token is token SKIP ()
Shifting token SKIP ()
Entering state 13
Reducing stack by rule 8 (line 28):
   $1 = token SKIP ()
-> $$ = nterm command ()
Stack now 0 1 4 7 9
Entering state 19
Reading a token: Next token is token SEMICOLON ()
Shifting token SEMICOLON ()
Entering state 29
Reducing stack by rule 7 (line 26):
   $1 = nterm commands ()
   $2 = nterm command ()
   $3 = token SEMICOLON ()
-> $$ = nterm commands ()
Stack now 0 1 4 7
Entering state 9
Reading a token: end;
Next token is token END ()
Shifting token END ()
Entering state 12
Reading a token: Next token is token SEMICOLON ()
Shifting token SEMICOLON ()
Entering state 20
Reducing stack by rule 1 (line 18):
   $1 = token LET ()
   $2 = nterm declarations ()
   $3 = token IN ()
   $4 = nterm commands ()
   $5 = token END ()
   $6 = token SEMICOLON ()
-> $$ = nterm program ()
Stack now 0
Entering state 2
Reading a token:

为什么我进入 end 后准备再次读取令牌；?? 我错过了什么？不应该到此结束吗？如果我现在输入任何内容，它会给我以下错误：

Reading a token: let 
Next token is token LET ()
syntax error, unexpected LET, expecting $end
Error: popping nterm program ()
Stack now 0
Cleanup: discarding lookahead token LET ()
Stack now 0

我的第二个疑问是实现这个编译器的下一步应该是什么？我的意思是这和代码生成部分之间还需要什么步骤？我现在如何实现符号表？以及如何让这个解析器接受来自文件的代码。直到现在在终端中提供输入，如果我想让这个接受来自 my_program.simple 之类的文件的代码怎么办？谢谢你。

score 1 · Accepted Answer

declarations :  
|INTEGER id_seq IDENTIFIER PERIOD
;
...

我认为您使用了错误的语法：您声明declarations（以及idseqand commands）可能是epsilon，即空产生式。那是因为|它是alternative运营商。空体和实际形态之间的选择。没有意义。

我认为可能是您的解析器循环的原因。

对于符号表，您可以使用在解析器外部声明为全局的映射（我希望您正在生成 C++）。然后在看到它们时插入符号。

在获得编译器之前，拥有一个工作解释器可能很有用，它更容易并阐明了将重用构建编译器的许多方面。

yacc - 关于编译器构造的疑问（Flex/Bison）

1 回答 1

Related

Reference