antlr - 解析没有明确结束标签的文本

Question

解析字典条目（见下面的例子）的问题是基于没有明确的开始和结束标签，但是：

一个元素的结束标签已经是下一个元素的开始标签
或者：开始标签不是句法元素，但它是当前的解析状态（所以它取决于你在输入流中已经“看到”的内容）

示例1，简单输入：

wordWithoutSpace [phonetic information]
definition as everything until colon: example sentence until EOF

示例 2，多定义条目：

wordWithoutSpace [phonetic information]
1. first definition until colon: example sentence until second definition
2. second definition until colon: example sentence until EOF

正如我会用文字或伪代码所说的那样：

dictionary-entry : 
     word = .+ ' ' // catch everything as word until you see a space
     phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
     (MultipleMeaning | UniqueMeaning)

MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number 
                                             // before the definition

UniqueMeaning : definition= .+ ':'

我试过带门的 Lexer（antlr 版本：3.2）

@members {
  int cs = 0; // current state
  }

@lexer::header {
  package main;
  }

Word :
  {cs==0}?=> .+ ' ' {cs=1;}     // in this state everything until 
  ;                             // Space belongs to the Word, now go to Phon-mode

Phon :
  {cs==1}?=> '[' .+ ']' {cs=2;} // everything in brackets is phonetic-information
;                               // after you have seen this go to next state

MultiDef : 
  {cs==2}?=> Int '.' .+ ':' {cs=3;}
  ;

Def : 
  {cs==2}?=> .+ ':' {cs=3;}
  ;

fragment
Digit :
  '0'..'9';

Int :
  Digit Digit*;

测试词法分析器：

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;

public class TestLexer {


    public static void main(String[] args) {


        String str = "Word [phon]1.definition:";
        CharStream input = new ANTLRStringStream(str);
        DudenLexer lexer = new DudenLexer(input);
        Token token;
        while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
          System.out.println("Token: "+token);
        }
    }
}

我遇到的问题：

我收到 error-msg: line 1:0 rule Def failed predicate: {cs==2}?
我不知道这是否是正确的做法？

我被困了大约三天，非常感谢任何帮助和提示。

谢谢你，汤姆

antlr - 解析没有明确结束标签的文本

0 回答 0

Related

Reference