antlr - ANTLR 输入与语法匹配但程序无法实现

Question

我正在为 SML 消息编写解析器。输入：包含许多 SML 消息的文件。输出：带有已识别元素的消息队列。这是我的代码：

grammar SML;
options {language = Java;}
@header {
  package SECSParser;
 import SECSParser.SMLLexer;
}

@lexer::header {
  package SECSParser;
}

@parser::members {
  public static void main(String[] args) throws Exception {
    String file = "C:\\Messages.sml";
    SMLLexer lexer = new SMLLexer(new ANTLRFileStream(file));
    SMLParser parser = new SMLParser(new CommonTokenStream(lexer));
    parser.program();
  }
}

@lexer::members {
  public static String place = "end";
  public static void setPlace(String text) { SMLLexer.place = text; }
  public static String getPlace() {return SMLLexer.place;}
  public static boolean placeIsType() {
    return (SMLLexer.place.equals("wb")
    | SMLLexer.place.equals("value")
    | SMLLexer.place.equals("type"));
  }
  public static boolean placeIsStreamFunction() {
    return (SMLLexer.place.equals("sf") | SMLLexer.place.equals("name"));
  }
  public static boolean placeIsWaitBit() {
    return (SMLLexer.place.equals("sf") | SMLLexer.place.equals("wb"));
  }
  public boolean ahead() {
    if ((input.LA(-2) == 'S') || (input.LA(-2) == 's')) {
      return false;
    }
    return true;
  }
}

program:(message)* EOF;
message:{System.out.println("MESSAGE     : \n");}
  {SMLLexer.setPlace("name");}
  name ws* ':' ws* {SMLLexer.setPlace("sf");} str_func 
  (ws+ {SMLLexer.setPlace("wb");} waitbit)? (ws+ item)? '.' 
   ws* {SMLLexer.setPlace("end");};

name:LETTER(LETTER| NUMBER| '_')* {System.out.println("NAME     : " + $text + "\n");};
fragment STR:~('\'' | '\"');
NUMBER:'0'..'9';
LETTER:(('A'..'Z') | ('a'..'z'));
str_func: (('S' | 's') stream ('F' | 'f') function);
stream: NUMBER+ {System.out.println("STREAM     : " + $text + "\n");};
function: NUMBER+ {System.out.println("FUNCTION     : " + $text + "\n");};
waitbit: {SMLLexer.placeIsWaitBit()}?=>('W' | 'w') {
  System.out.println("WAITBIT     : " + $text + "\n");
};
item:{System.out.println("ITEM     : \n");} ws* SITEM ws* {SMLLexer.setPlace("type");}
  TYPE ( (ws* '[' number_item ']')? ws+ {SMLLexer.setPlace("value");}value)? 
  ws* EITEM ws* COMMENT? ws*;
SITEM: '<' {SMLLexer.setPlace("type");};
EITEM: '>';
TYPE:{SMLLexer.placeIsType()}?=>( 'A' | 'a' | 'L'| 'l'| 'BINARY'| 'binary'| 'BOOLEAN'| 'boolean'| 'JIS'| 'jis'| 'I8'| 'i8' | 'I1'| 'i1'| 'I2'| 'i2' | 'I4'| 'i4'| 'F4'| 'f4'| 'F8'| 'f8'| 'U8'| 'u8' | 'U1'| 'u1'| 'U2' | 'u2'| 'U4'| 'u4' ){System.out.println("TYPE     : " + $text + "\n");};
number_item: NUMBER+ {System.out.println("NUMBER ITEM     : " + $text + "\n");};
value:(item ws*)+| (string ws*)+| ((LETTER| NUMBER)ws*)+;
COMMENT:('/*' (options {greedy=false;}: .)* '*/') {$channel = HIDDEN;};
string:('\'' STR? '\'')| ('\"' STR? '\"') {System.out.println("VALUE     : " + $text + "\n");};
ANY:.;
ws:(' '| '\t'| '\r'| '\n'| '\f');

这是我的文件“Message.sml”

Are_You_There1l : S1F4 W.
On_Line_Data:S1F4 W
<L[2]
    <U4 13>
    <U4 7>
>.
W1Are_You_There: S1F4 W.

结果是：

MESSAGE     : 
NAME     : Are_You_There1l
STREAM     : 1
FUNCTION     : 4
WAITBIT     : W
MESSAGE     : 
NAME     : On_Line_Data
STREAM     : 1
FUNCTION     : 4
WAITBIT     : W
ITEM     : 
MESSAGE     : 
NAME     : L
TYPE     : U4
TYPE     : U4
MESSAGE     : 
NAME     : Are_You_There
STREAM     : 1
FUNCTION     : 4
WAITBIT     : W

**C:\Messages.sml line 4:1 mismatched input 'L' expecting TYPE
C:\Messages.sml line 4:2 mismatched input '[' expecting ':'**

不知道为什么我的程序不能实现 TYPE:'L'?? 我尝试使用 TYPE'U4'，它可以工作。

score 0 · Accepted Answer

有太多事情出错了，无法为您的问题提供答案。即使您的问题得到了回答，也不会有任何帮助，因为语法包含太多错误。我建议把它扔掉并重新开始。但在重新开始之前，请阅读一些ANTLR 教程或获取一份The Definitive ANTLR Reference的副本。

一些问题：

您似乎不知道解析器和词法分析器规则之间的区别。你的一些解析器规则应该是词法分析器规则，你的一些词法分析器规则应该是真正的解析器规则；
您在解析器规则中使用片段规则：这永远不会起作用，因为片段规则本身永远不会变成标记。分片规则只能在词法分析器规则（或其他分片规则）中使用；
您正在从解析器设置（静态）词法分析器变量：您不能这样做！解析器自己缓冲令牌会导致你的逻辑出现严重错误。词法分析器和解析器之间有严格的分离：词法分析器只生成标记，而不受解析器的任何干扰！词法分析是一个单独的过程。如果您确实需要，请选择 ANTLR 以外的其他内容（Google 用于“无扫描仪解析”、“PEG”和/或“packrat”）。这个问题很可能是为什么在您的特定情况下'L'没有被标记为 a ；TYPE
您正在使用文字标记，例如('W' | 'w')，但也将 aLETTER作为词法分析器规则。但是，单个'w'or'W'现在永远不会被标记为LETTERsince。在解析器规则中定义文字标记或多或少与以下操作相同：

W : 'w' | 'W';
LETTER : 'a'..'z' | 'A'..'Z'; // this will never match a 'w' or 'W' now!

这也与 ANTLR 的词法分析器独立于解析器运行这一事实有关。

再说一遍：在继续 IMO 之前，您确实需要掌握基础知识。

祝你好运！

antlr - ANTLR 输入与语法匹配但程序无法实现

1 回答 1

Related

Reference