c++ - 扫描的测试文件的最后一个字符是在线输出的，它实际上并不在线

Question

大家晚上好，我正在为编译器类开发扫描仪。我有一个我们必须扫描的测试文件，打印令牌所在的行、令牌是什么以及它的 ID 号。该程序正常工作，除了测试文件中的最后一个字符是 .（句点）。该句点实际上在第 17 行，但我的扫描仪在第 18 行输出它以及 EOF 令牌。我试图看看是否有一双新的眼睛可以看到我所缺少的。所有其他标记在各自的行上输出。让我给你扫描仪本身。扫描仪中还有其他几个功能，但我认为这个问题不需要它们。

void scanner(FILE *file) {
const int FINAL_STATE = -1, ERROR_STATE = -2;
char next_char;
static int line_count = 1;
string s = "";
int next_state, state = 0;

while(state != FINAL_STATE) {
  next_char = get_char(file);

  // deal with comments
  if(next_char == '&') {
     next_char = get_char(file);
     while(next_char != '\n') {
        next_char = get_char(file);
        if (next_char == '\n') {
            line_count++;
        }
     }
     continue;
  }

  // count lines
  if(next_char == '\n') {
     line_count++;
  }

  // deal with EOF
  if(next_char == EOF) {
     tk.lexeme = "EOF";
     tk.tk_num = eof_tk;
     tk.line_num = line_count;
     return;
  }
  next_state = table[state][c_val(next_char)];
  if(next_state == ERROR_STATE) {
     cout << "error on line [" << line_count << "]\n";
     exit(0);
  }

  // deal with final state         <------------I think my problem is here
  if(next_state == FINAL_STATE) {
     if(!isspace(next_char)) {
        ungetc(next_char, file);
     }

     if(table[state][1] == id_tk) {
        for(int t = 0; t < size(keywords); t++) {
           if(keywords[t].compare(s) == 0) {
              tk.lexeme = s;
              tk.tk_num = key_assign(t);
              tk.line_num = line_count;
              return;
           }

           else {
              tk.lexeme = s;
              tk.tk_num = id_tk;
              tk.line_num = line_count;
           }
        }

     if(tk.lexeme == "") {
        tk.lexeme = s;                                              
     }
     }

     else {
        tk.lexeme = s;                                      // string
        tk.tk_num = (token_type)table[state][1];            // type
        tk.line_num = line_count;                           // line
     }

     return;
  }

  state = next_state;

  if(!isspace(next_char)) {
     s += next_char;
  }
 }
}

这是扫描仪功能的 main 调用：

 while(!feof(fp)) {
        scanner(fp);
        cout << "Line: " << tk.line_num << " Token: " << tk.lexeme << " Instance: " << tk.tk_num << endl;
    }

如果需要更多代码，我很乐意编辑这篇文章，但我不想用代码重载这篇文章。最后但并非最不重要的是测试文件的格式：

& First list of all separetd by spaces to make sure nothing is missing

qwerty uiop asdfg hjkl zxcv bnm a12345 a67890 a_ a_b abcdefghij

Start Stop Then If Iff While Var Int Float Do Read Write Void Return Dummy Program

= == < > !  +  -  *  / %  =< =>

. (  ) , { } ; [ ] :

12345 67890 001 0123456789

& now some tokens without space separators

Start_ Start.Stop Start+Stop Then=If If==Iff WhileInt start stop

x=a x==a x<=1 x>=2 x,y(z){x;y:u}[1,2,3]. <-------- This period

此外，这是程序的输出，请注意这只是最后几行。

Line: 17 Token: y Instance: 1
Line: 17 Token: : Instance: 10
Line: 17 Token: u Instance: 1
Line: 17 Token: } Instance: 22
Line: 17 Token: [ Instance: 24
Line: 17 Token: 1 Instance: 2
Line: 17 Token: , Instance: 20
Line: 17 Token: 2 Instance: 2
Line: 17 Token: , Instance: 20
Line: 17 Token: 3 Instance: 2
Line: 17 Token: ] Instance: 25
Line: 18 Token: . Instance: 17        <---------This last token should be on 17
Line: 18 Token: EOF Instance: 0

谢谢大家围观。我很感激。

score 2 · Accepted Answer

看起来您的扫描仪在打印结果之前正在增加行号，因为之后的 next_char.是\n（大多数文本编辑器在文件末尾输入隐藏的换行符） line_count 过早地增加？

我会尝试\n从文件中删除最后一个，看看是否会改变结果

score 2 · Accepted Answer

@diclophis 很好地解释了您的一个问题。

（虽然get_char()没有显示，假设它像getchar(). ）

EOF 测试错误

if(next_char == EOF) {是错的。 next_char使用 typechar并且 EOF 是 type int。您可以读取与 EOF 具有相同 8 位模式且不是 EOF 的字节并在错误的字节上退出。通过使用int next_char并确保get_char()返回类似getchar().

2.潜在的无限循环

如果'&'是文件中的最后一个字节，则不会退出此循环。

if(next_char == '&') {
  next_char = get_char(file);
  while(next_char != '\n') {
    ...
    }
 }

3.错误的eof()测试。如果在您尝试读取最后一个字节之后文件没有更多数据，则返回 true 。

while(!feof(fp)) {

推荐一个惯用语

int next_char;
while((next_char = get_char()) != EOF) {
  ...

c++ - 扫描的测试文件的最后一个字符是在线输出的，它实际上并不在线

2 回答 2

Related

Reference