1

解析字典条目(见下面的例子)的问题是基于没有明确的开始和结束标签,但是:

  • 一个元素的结束标签已经是下一个元素的开始标签
  • 或者:开始标签不是句法元素,但它是当前的解析状态(所以它取决于你在输入流中已经“看到”的内容)

示例1,简单输入:

wordWithoutSpace [phonetic information]
definition as everything until colon: example sentence until EOF

示例 2,多定义条目:

wordWithoutSpace [phonetic information]
1. first definition until colon: example sentence until second definition
2. second definition until colon: example sentence until EOF

正如我会用文字或伪代码所说的那样:

dictionary-entry : 
     word = .+ ' ' // catch everything as word until you see a space
     phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
     (MultipleMeaning | UniqueMeaning)

MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number 
                                             // before the definition

UniqueMeaning : definition= .+ ':'

我试过带门的 Lexer(antlr 版本:3.2)

@members {
  int cs = 0; // current state
  }

@lexer::header {
  package main;
  }

Word :
  {cs==0}?=> .+ ' ' {cs=1;}     // in this state everything until 
  ;                             // Space belongs to the Word, now go to Phon-mode

Phon :
  {cs==1}?=> '[' .+ ']' {cs=2;} // everything in brackets is phonetic-information
;                               // after you have seen this go to next state

MultiDef : 
  {cs==2}?=> Int '.' .+ ':' {cs=3;}
  ;

Def : 
  {cs==2}?=> .+ ':' {cs=3;}
  ;

fragment
Digit :
  '0'..'9';

Int :
  Digit Digit*;

测试词法分析器:

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;

public class TestLexer {


    public static void main(String[] args) {


        String str = "Word [phon]1.definition:";
        CharStream input = new ANTLRStringStream(str);
        DudenLexer lexer = new DudenLexer(input);
        Token token;
        while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
          System.out.println("Token: "+token);
        }
    }
}

我遇到的问题:

  • 我收到 error-msg: line 1:0 rule Def failed predicate: {cs==2}?
  • 我不知道这是否是正确的做法?

我被困了大约三天,非常感谢任何帮助和提示。

谢谢你,汤姆

4

0 回答 0