1

我正在使用 antlr 编写简单的类似 smalltalk 的语法。它是 smalltalk 的简化版本,但基本思想是相同的(例如消息传递)。

到目前为止,这是我的语法:

grammar GAL;

options {
    //k=2;
    backtrack=true;
}

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT :   '0'..'9'+
    ;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '"' ( options {greedy=false;} : . )* '"' {$channel=HIDDEN;}
    ;

WS  :   ( ' '
        | '\t'
        ) {$channel=HIDDEN;}
    ;

NEW_LINE
    :   ('\r'?'\n')
    ;

STRING
    :  '\'' ( ESC_SEQ | ~('\\'|'\'') )* '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

BINARY_MESSAGE_CHAR
    :   ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')
        ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')?
    ;

// parser

program
    :   NEW_LINE* (statement (NEW_LINE+ | EOF))*
    ;

statement

    :   message_sending
    |   return_statement
    |   assignment
    |   temp_variables
    ;

return_statement
    :   '^' statement
    ;

assignment
    :   identifier ':=' statement
    ;

temp_variables
    :   '|' identifier+ '|'
    ;

object
    :   raw_object
    ;

raw_object
    :   number
    |   string
    |   identifier
    |   literal
    |   block
    |   '(' message_sending ')'
    ;

message_sending
    :   keyword_message_sending
    ;

keyword_message_sending
    :   binary_message_sending keyword_message?
    ;

binary_message_sending
    :   unary_message_sending binary_message*
    ;

unary_message_sending
    :   object (unary_message)*
    ;

unary_message
    :   unary_message_selector
    ;

binary_message
    :   binary_message_selector unary_message_sending
    ;

keyword_message
    :   (NEW_LINE? single_keyword_message_selector NEW_LINE? binary_message_sending)+
    ;

block 
    : 
      '[' (block_signiture

      )? NEW_LINE* 
      block_body

      NEW_LINE* ']'
    ;

block_body 
    :  (statement 

      )?
      (NEW_LINE+ statement 

      )*
    ;


block_signiture 
    : 
      (':' identifier

      )+ '|'
    ;

unary_message_selector
    :   identifier
    ;

binary_message_selector
    :   BINARY_MESSAGE_CHAR
    ;

single_keyword_message_selector
    :   identifier ':'
    ;

keyword_message_selector
    :   single_keyword_message_selector+
    ;

symbol
    :   '#' (string | identifier | binary_message_selector | keyword_message_selector)
    ; 

literal
    :   symbol block? // if there is block then this is method
    ;

number
    : /*'-'?*/
    ( INT | FLOAT )
    ;

string
    :   STRING
    ;

identifier
    :   ID
    ;

1. 一元减号

我对数字的一元减号有疑问(规则的注释部分number)。问题是减号是有效的二进制消息。更糟糕的是,两个减号也是有效的二进制消息。我需要的是一元减号,以防没有对象可以向其发送二进制消息(例如,-3+4 应该是一元减号,因为 -3 前面没有任何内容)。此外,(-3) 也应该是二进制减号。如果 1 -- -2 是带有参数 -2 的二进制消息“--”,那就太好了,但我可以没有它。我怎样才能做到这一点?

如果我取消注释一元减号,则在解析 1-2 之类的内容时会出现错误 MismatchedSetException(0!=null)。

2. 消息链

在 smalltalk 中实现消息链接的最佳方法是什么?我的意思是这样的:

obj message1 + 3; 
    message2; 
    + 3; 
    keyword: 2+3

在这种情况下,每条消息都将发送到同一个对象obj。应保持消息优先级(一元 > 二进制 > 关键字)。

3. 回溯

大多数语法可以用 解析k=2,但是当输入是这样的:

1 + 2
Obj message: 
    1 + 2
    message2: 'string'

解析器尝试将 Obj 匹配为single_keyword_message_selector并引发UnwantedTokenExcaptiontoken message。如果删除k=2并设置backtrack=true(像我一样)一切正常。如何删除回溯并获得所需的行为?

此外,大多数语法都可以使用 解析k=1,所以我尝试k=2只为需要它的规则设置,但这被忽略了。我做了这样的事情:

rule
    options { k = 2; }
    : // rule definition
    ;

但在我在全局选项中设置 k 之前它不起作用。我在这里想念什么?


更新

从头开始编写语法并不是理想的解决方案,因为我有很多代码依赖它。此外,smalltalk 缺少的一些功能是设计上缺少的。这并不是要成为另一个 smalltalk 实现,smalltalk 只是一个灵感。

在这样的情况下,我会更乐意让一元减号工作:-1+22+(-1). 像这样2 -- -1的情况并不那么重要。

此外,消息链接应该尽可能简单地完成。这意味着我不喜欢改变我正在生成的 AST 的想法。

关于回溯 - 我可以忍受它,只是出于个人好奇在这里问。

这是生成 AST 的少量修改语法 - 也许它有助于更​​好地理解我不想更改的内容。(temp_variables 可能会被删除,我还没有做出那个决定)。

grammar GAL;

options {
    //k=2;
    backtrack=true;
    language=CSharp3;
    output=AST;
}

tokens {
    HASH     = '#';
    COLON    = ':';
    DOT      = '.';
    CARET    = '^';
    PIPE     = '|';
    LBRACKET = '[';
    RBRACKET = ']';
    LPAREN   = '(';
    RPAREN   = ')';
    ASSIGN   = ':=';
}

// generated files options
@namespace { GAL.Compiler }
@lexer::namespace { GAL.Compiler}

// this will disable CLSComplaint warning in ANTLR generated code
@parser::header { 
// Do not bug me about [System.CLSCompliant(false)]
#pragma warning disable 3021 
}

@lexer::header { 
// Do not bug me about [System.CLSCompliant(false)]
#pragma warning disable 3021 
}

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT :   '0'..'9'+
    ;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '"' ( options {greedy=false;} : . )* '"' {$channel=Hidden;}
    ;

WS  :   ( ' '
        | '\t'
        ) {$channel=Hidden;}
    ;

NEW_LINE
    :   ('\r'?'\n')
    ;

STRING
    :  '\'' ( ESC_SEQ | ~('\\'|'\'') )* '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

BINARY_MESSAGE_CHAR
    :   ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')
        ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')?
    ;

// parser

public program returns [ AstProgram program ]
    : { $program = new AstProgram(); }
    NEW_LINE* 
    ( statement (NEW_LINE+ | EOF)
        { $program.AddStatement($statement.stmt); }
    )*
    ;

statement returns [ AstNode stmt ]
    : message_sending
        { $stmt = $message_sending.messageSending; } 
    | return_statement
        { $stmt = $return_statement.ret; }
    | assignment
        { $stmt = $assignment.assignment; }
    | temp_variables
        { $stmt = $temp_variables.tempVars; }
    ;

return_statement returns [ AstReturn ret ]
    : CARET statement
        { $ret = new AstReturn($CARET, $statement.stmt); }
    ;

assignment returns [ AstAssignment assignment ]
    : dotted_expression ASSIGN statement
        { $assignment = new AstAssignment($dotted_expression.dottedExpression, $ASSIGN, $statement.stmt); }
    ;

temp_variables returns [ AstTempVariables tempVars ]
    : p1=PIPE 
        { $tempVars = new AstTempVariables($p1); }
    ( identifier
        { $tempVars.AddVar($identifier.identifier); }
    )+ 
    p2=PIPE
        { $tempVars.EndToken = $p2; }
    ;

object returns [ AstNode obj ]
    : number
        { $obj = $number.number; }
    | string
        { $obj = $string.str; }
    | dotted_expression
        { $obj = $dotted_expression.dottedExpression; }
    | literal
        { $obj = $literal.literal; }
    | block
        { $obj = $block.block; }
    | LPAREN message_sending RPAREN
        { $obj = $message_sending.messageSending; }
    ;

message_sending returns [ AstKeywordMessageSending messageSending ]
    : keyword_message_sending
        { $messageSending = $keyword_message_sending.keywordMessageSending; }
    ;

keyword_message_sending returns [ AstKeywordMessageSending keywordMessageSending ]
    : binary_message_sending 
        { $keywordMessageSending = new AstKeywordMessageSending($binary_message_sending.binaryMessageSending); }
    ( keyword_message
        { $keywordMessageSending = $keywordMessageSending.NewMessage($keyword_message.keywordMessage); }
    )?
    ;

binary_message_sending returns [ AstBinaryMessageSending binaryMessageSending ]
    : unary_message_sending
        { $binaryMessageSending = new AstBinaryMessageSending($unary_message_sending.unaryMessageSending); }
    ( binary_message
        { $binaryMessageSending = $binaryMessageSending.NewMessage($binary_message.binaryMessage); }
    )*
    ;

unary_message_sending returns [ AstUnaryMessageSending unaryMessageSending ]
    : object 
        { $unaryMessageSending = new AstUnaryMessageSending($object.obj); }
    (
      unary_message
        { $unaryMessageSending = $unaryMessageSending.NewMessage($unary_message.unaryMessage); }
    )*
    ;

unary_message returns [ AstUnaryMessage unaryMessage ]
    : unary_message_selector
        { $unaryMessage = new AstUnaryMessage($unary_message_selector.unarySelector); }
    ;

binary_message returns [ AstBinaryMessage binaryMessage ]
    : binary_message_selector unary_message_sending
        { $binaryMessage = new AstBinaryMessage($binary_message_selector.binarySelector, $unary_message_sending.unaryMessageSending); }
    ;

keyword_message returns [ AstKeywordMessage keywordMessage ]
    : 
    { $keywordMessage = new AstKeywordMessage(); }
    (
      NEW_LINE? 
      single_keyword_message_selector 
      NEW_LINE? 
      binary_message_sending
        { $keywordMessage.AddMessagePart($single_keyword_message_selector.singleKwSelector, $binary_message_sending.binaryMessageSending); }
    )+
    ;

block returns [ AstBlock block ]
    : LBRACKET 
        { $block = new AstBlock($LBRACKET); }
    (
      block_signiture
        { $block.Signiture = $block_signiture.blkSigniture; }
    )? NEW_LINE* 
      block_body
        { $block.Body = $block_body.blkBody; }
      NEW_LINE* 
      RBRACKET
        { $block.SetEndToken($RBRACKET); }
    ;

block_body returns [ IList<AstNode> blkBody ]
    @init { $blkBody = new List<AstNode>(); }
    : 
    ( s1=statement 
        { $blkBody.Add($s1.stmt); }
    )?
    ( NEW_LINE+ s2=statement 
        { $blkBody.Add($s2.stmt); }
    )*
    ;


block_signiture returns [ AstBlockSigniture blkSigniture ]
    @init { $blkSigniture = new AstBlockSigniture(); }
    : 
    ( COLON identifier
        { $blkSigniture.AddIdentifier($COLON, $identifier.identifier); }
    )+ PIPE
        { $blkSigniture.SetEndToken($PIPE); }
    ;

unary_message_selector returns [ AstUnaryMessageSelector unarySelector ]
    : identifier
        { $unarySelector = new AstUnaryMessageSelector($identifier.identifier); }
    ;

binary_message_selector returns [ AstBinaryMessageSelector binarySelector ]
    : BINARY_MESSAGE_CHAR
        { $binarySelector = new AstBinaryMessageSelector($BINARY_MESSAGE_CHAR); }
    ;

single_keyword_message_selector returns [ AstIdentifier singleKwSelector ]
    : identifier COLON
        { $singleKwSelector = $identifier.identifier; }
    ;

keyword_message_selector returns [ AstKeywordMessageSelector keywordSelector ]
    @init { $keywordSelector = new AstKeywordMessageSelector(); }
    : 
    ( single_keyword_message_selector
        { $keywordSelector.AddIdentifier($single_keyword_message_selector.singleKwSelector); }
    )+
    ;

symbol returns [ AstSymbol symbol ]
    : HASH 
    ( string 
        { $symbol = new AstSymbol($HASH, $string.str); }
    | identifier 
        { $symbol = new AstSymbol($HASH, $identifier.identifier); }
    | binary_message_selector 
        { $symbol = new AstSymbol($HASH, $binary_message_selector.binarySelector); }
    | keyword_message_selector
        { $symbol = new AstSymbol($HASH, $keyword_message_selector.keywordSelector); }
    )
    ; 

literal returns [ AstNode literal ]
    : symbol
        { $literal = $symbol.symbol; }
    ( block
        { $literal = new AstMethod($symbol.symbol, $block.block); }
    )? // if there is block then this is method
    ;

number returns [ AstNode number ]
    : /*'-'?*/
    ( INT
        { $number = new AstInt($INT); }
    | FLOAT 
        { $number = new AstInt($FLOAT); }
    )
    ;

string returns [ AstString str ]
    : STRING
        { $str = new AstString($STRING); }
    ;

dotted_expression returns [ AstDottedExpression dottedExpression ]
    : i1=identifier 
        { $dottedExpression = new AstDottedExpression($i1.identifier); }
    (DOT i2=identifier
        { $dottedExpression.AddIdentifier($i2.identifier); }
    )*
    ;

identifier returns [ AstIdentifier identifier ]
    : ID
        { $identifier = new AstIdentifier($ID); }
    ;
4

1 回答 1

1

嗨 Smalltalk 语法作家,

首先,要让 smalltalk 语法正确解析 (1 -- -2) 并支持可选的 '.' 在最后一条语句等处,您应该将空格视为重要的。不要把它放在隐藏频道上。

到目前为止,语法并没有将规则分解成足够小的片段。这将是您在 K=2 和回溯中看到的问题。

我建议您查看 Redline Smalltalk 项目http://redline.sthttps://github.com/redline-smalltalk/redline-smalltalk定义的 ANTLR 中有效的 Smalltalk 语法

Rgs,詹姆斯。

于 2012-05-11T04:49:51.527 回答