解析字典条目(见下面的例子)的问题是基于没有明确的开始和结束标签,但是:
- 一个元素的结束标签已经是下一个元素的开始标签
- 或者:开始标签不是句法元素,但它是当前的解析状态(所以它取决于你在输入流中已经“看到”的内容)
示例1,简单输入:
wordWithoutSpace [phonetic information]
definition as everything until colon: example sentence until EOF
示例 2,多定义条目:
wordWithoutSpace [phonetic information]
1. first definition until colon: example sentence until second definition
2. second definition until colon: example sentence until EOF
正如我会用文字或伪代码所说的那样:
dictionary-entry :
word = .+ ' ' // catch everything as word until you see a space
phon = '[' .+ ']' // then follows phonetic, which is everything in brackets
(MultipleMeaning | UniqueMeaning)
MultipleMeaning : Int '.' definition= .+ ':' // a MultipleMeaning has a number
// before the definition
UniqueMeaning : definition= .+ ':'
我试过带门的 Lexer(antlr 版本:3.2)
@members {
int cs = 0; // current state
}
@lexer::header {
package main;
}
Word :
{cs==0}?=> .+ ' ' {cs=1;} // in this state everything until
; // Space belongs to the Word, now go to Phon-mode
Phon :
{cs==1}?=> '[' .+ ']' {cs=2;} // everything in brackets is phonetic-information
; // after you have seen this go to next state
MultiDef :
{cs==2}?=> Int '.' .+ ':' {cs=3;}
;
Def :
{cs==2}?=> .+ ':' {cs=3;}
;
fragment
Digit :
'0'..'9';
Int :
Digit Digit*;
测试词法分析器:
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.Token;
public class TestLexer {
public static void main(String[] args) {
String str = "Word [phon]1.definition:";
CharStream input = new ANTLRStringStream(str);
DudenLexer lexer = new DudenLexer(input);
Token token;
while ((token = lexer.nextToken())!=Token.EOF_TOKEN) {
System.out.println("Token: "+token);
}
}
}
我遇到的问题:
- 我收到 error-msg: line 1:0 rule Def failed predicate: {cs==2}?
- 我不知道这是否是正确的做法?
我被困了大约三天,非常感谢任何帮助和提示。
谢谢你,汤姆