eclipse - 如何使用 antlr 显示句子中的所有代词及其人

Question

根据 WayneH 的语法编辑

这是我的语法文件中的内容。

grammar pfinder;

options {
  language = Java;
}
sentence
    : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
    ;

words 
    :   WORDS {System.out.println($text);};

pronoun returns [String value] 
    : sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
    | ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
    | sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
    | pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
    | psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
    | pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};

sfirst returns [String value] :  ('i'   | 'me'  | 'my'   | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] :  ('he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] :  ('we'  | 'us'  | 'our'  | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] :  ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};

WORDS : LETTER*;// {$channel=HIDDEN;}; 
SPACE : (' ')?;
fragment LETTER :  ('a'..'z' | 'A'..'Z');

在这里，这是我在 java 测试类中所拥有的

import java.util.Scanner;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import java.util.List;

public class test2 {
    public static void main(String[] args) throws RecognitionException {
        String s;
        Scanner input = new Scanner(System.in);
        System.out.println("Eter a Sentence: ");
        s=input.nextLine().toLowerCase();
        ANTLRStringStream in = new ANTLRStringStream(s);
        pfinderLexer lexer = new pfinderLexer(in);
        TokenStream tokenStream = new CommonTokenStream(lexer);
        pfinderParser parser = new pfinderParser(tokenStream); 
        parser.pronoun(); 
    }
}

我需要在测试文件中放入什么，以便它将显示句子中的所有代词及其各自的值（s1，s2，...）？

score 1 · Accepted Answer

我认为您需要了解更多关于 ANTLR 中的词法分析器规则，词法分析器规则以大写字母开头并为解析器将查看的流生成标记。词法分析器片段规则不会为流生成标记，但会帮助其他词法分析器规则生成标记，请查看词法分析器规则 WORDS 和 LETTER（LETTER 不是标记，但确实有助于 WORDS 创建标记）。

现在，当将文本文字放入解析器规则（规则名称将以小写字母开头）时，该文本文字也是词法分析器将识别和传递的有效标记（至少在您使用 ANTLR 时 - 我没有使用任何其他类似于 ANTLR 的工具来回答他们）。

我注意到的下一件事是您的“s”和“代词”规则似乎是同一件事。我注释掉了“s”规则并将所有内容放入“代词”规则中

然后最后一件事是学习如何将动作放入语法中，你在's'规则中有一些设置返回值。我让代词规则返回一个字符串值，这样如果你想要你的“句子”规则中的动作，你就可以很容易地完成你的“-i 代词”评论/答案。

现在，由于我不知道您的确切结果是什么，所以我与您的语法一起玩并进行了一些细微的修改和重组（将我认为是解析器规则的内容移到顶部，并将所有词法分析器规则保留在底部）并采取了一些行动我想会告诉你你需要什么。此外，可能有几种不同的方法来实现这一点，我认为我的解决方案对于您可能想要的任何结果都不是完美的，但这是我能够在 ANTLRWorks 中工作的语法：

grammar pfinder;

options {
  language = Java;
}
sentence
    : ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
    ;

words 
    :   WORDS {System.out.println($text);};

pronoun returns [String value] 
    : sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
    | ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
    | sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
    | pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
    | psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
    | pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};

//s returns [String value]
//    :  exp=sfirst  {$value = "s1";}
//    |  exp=ssecond {$value = "s2";}
//    |  exp=sthird  {$value = "s3";}
//    |  exp=pfirst  {$value = "p1";}
//    |  exp=psecond {$value = "p2";}
//    |  exp=pthird  {$value = "p3";}
//    ;

sfirst returns [String value] :  ('i'   | 'me'  | 'my'   | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] :  ('he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] :  ('we'  | 'us'  | 'our'  | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] :  ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};

WORDS : LETTER*;// {$channel=HIDDEN;}; 
SPACE : (' ')?;
fragment LETTER :  ('a'..'z' | 'A'..'Z');

我认为最终结果是这个语法将向您展示如何完成您正在尝试做的事情，并且无论最终结果是什么都需要修改。

祝你好运。

我认为您只需要更改测试类中的一行 parser.pronoun(); to: parser.sentence();

您可能还想更改语法中的其他一些内容：SPACE : ' '; 句子：（单词|代词）（空格（单词|代词））*（'。'|'？'）；// 那么你可能想在句子和单词/代词之间放置一条规则。

score 1 · Accepted Answer

片段不会创建标记，将它们放在解析器规则中不会产生理想的结果。

在我的测试盒上，这产生了（我认为！）预期的结果：

program :
        PRONOUN+
    ;

PRONOUN :
        'i'   | 'me'  | 'my'   | 'mine'
    |   'you' | 'your'| 'yours'| 'yourself'
    |   'he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself'
    |   'we'  | 'us'  | 'our'  | 'ours'
    |   'yourselves'
    |   'they'| 'them'| 'their'| 'theirs' | 'themselves'
    ;

WS  :   ' ' { $channel = HIDDEN; };

WORD    :   ('A'..'Z'|'a'..'z')+ { $channel = HIDDEN; };

在 Antlrworks 中，一个示例“我踢了你”返回了树结构：program -> [i, you].

我不得不指出，Antlr 将代词从句子中剥离出来是矫枉过正的。考虑使用正则表达式。此语法不区分大小写。扩展 WORD 以使用除您的代词词典（例如标点符号等）之外的所有内容可能有点乏味。需要对输入进行消毒。

---编辑：响应第二个OP：

我已经更改了原始语法以方便解析。新语法是：

grammar pfinder;

options {
    backtrack=true;
    output = AST;
}

tokens {
    PROGRAM;
}

program :
        (WORD* p+=PRONOUN+ WORD*)*
        -> ^(PROGRAM $p*)
    ;


PRONOUN :
        'i'   | 'me'  | 'my'   | 'mine'
    |   'you' | 'your'| 'yours'| 'yourself'
    |   'he'  | 'she' | 'it'   | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself'
    |   'we'  | 'us'  | 'our'  | 'ours' | 'yourselves'
    |   'they'| 'them'| 'their'| 'theirs' | 'themselves'
;

WS  :   ' ' { $channel = HIDDEN; };

WORD    :   ('A'..'Z'|'a'..'z')+;

我将解释这些变化：

现在需要回溯来解决解析器规则程序。也许有一种更好的方法来编写它，它不需要回溯，但这是我首先想到的。
已经定义了一个虚构的标记 PROGRAM 来对我们的代词进行分组。
每个匹配的程序都被添加到 Antlr var $p 中，并在 AST 中根据 imaginary rule 重写。
解释器代码现在可以使用 CommonTree 来收集匹配的代词

以下是用 C# 编写的（我不懂 Java），但我编写它的目的是让您能够阅读和理解它。

static object[] ReadTokens( string text )
{
    ArrayList results = new ArrayList();
    pfinderLexer Lexer = new pfinderLexer(new Antlr.Runtime.ANTLRStringStream(text));
    pfinderParser Parser = new pfinderParser(new CommonTokenStream(Lexer));
    // syntaxTree is imaginary token {PROGRAM},
    // its children are the pronouns collected by $p in grammar.
    CommonTree syntaxTree = Parser.program().Tree as CommonTree;
    if ( syntaxTree == null ) return null;
    foreach ( object pronoun in syntaxTree.Children )
    {
        results.Add(pronoun.ToString());
    }
    return results.ToArray();
}

调用 ReadTokens("i kicked you and them") 返回数组 ["i", "you", "them"]

score 1 · Accepted Answer

如果您尝试对口语/书面语言进行某种高级分析，您可能会考虑使用某种自然语言处理工具。例如，TagHelper 工具会告诉您哪些元素是代词（以及动词、名词、副词和其他深奥的语法结构）。（THT 是我熟悉的唯一此类工具，因此不要将其视为对令人敬畏的特别认可）。

eclipse - 如何使用 antlr 显示句子中的所有代词及其人

3 回答 3

Related

Reference