6

I'm implementing a new DSL in Marpa and (coming from Regexp::Grammars) I'm more than satisfied. My language supports a bunch of unary and binary operators, objects with C-style identifiers and method calls using the familiar dot notation. For example:

foo.has(bar == 42 AND baz == 23)

I found the prioritized rules feature offered by Marpa's grammar description language and have come to rely on that a lot, so I have nearly only one G1 rule Expression. Excerpt (many alternatives, and semantic actions omitted for brevity):

Expression ::=
      NumLiteral
    | '(' Expression ')'             assoc => group
   || Expression ('.') Identifier
   || Expression ('.') Identifier Args
    | Expression ('==') Expression
   || Expression ('AND') Expression

Args     ::= ('(') ArgsList (')')
ArgsList ::= Expression+             separator => [,]

Identifier         ~ IdentifierHeadChar IdentifierBody
IdentifierBody     ~ IdentifierBodyChar*
IdentifierHeadChar ~ [a-zA-Z_]
IdentifierBodyChar ~ [a-zA-Z0-9_]

NumLiteral ~ [0-9]+

As you can see, I'm using the Scanless interface (SLIF). My problem is that this also parses, for example:

foo.AND(5)

Marpa knows that there can only be an identifier after a dot, so it doesn't even consider the fact that AND might be a keyword. I know that I can avoid that problem by doing a separate lexing stage that identifies AND as a keyword explicitly, but that tiny papercut is not quite worth the effort.

Is there a way in SLIF to restrict the Identifier rule to non-keyword identifiers only?

4

2 回答 2

2

我不知道如何在语法中表达这样的事情。但是,您可以为 Identifier 引入一个中间非终结符来检查条件:

#!/usr/bin/perl
use warnings;
use strict;
use Syntax::Construct qw{ // };

use Marpa::R2;

my %reserved = map { $_ => 1 } qw( AND );

my $grammar = 'Marpa::R2::Scanless::G'->new(
    { bless_package => 'main',
      source => \( << '__GRAMMAR__'),

:default ::= action => store

:start ::= S
S ::= Id
  | Id NumLiteral
Id ::= Identifier action => allowed

Identifier         ~ IdentifierHeadChar IdentifierBody
IdentifierBody     ~ IdentifierBodyChar*
IdentifierHeadChar ~ [a-zA-Z_]
IdentifierBodyChar ~ [a-zA-Z0-9_]

NumLiteral ~ [0-9]+

:discard ~ whitespace
whitespace ~ [\s]+

__GRAMMAR__
    });

for my $value ('ABC', 'ABC 42', 'AND 1') {
    my $value = $grammar->parse(\$value, 'main');
    print $$value, "\n";
}


sub store {
    my (undef, $id, $arg) = @_;
    $arg //= 'null';
    return "$id $arg";
}

sub allowed {
    my (undef, $id) = @_;
    die "Reserved keyword $id" if $reserved{$id};
    return $id
}
于 2014-11-24T17:39:41.393 回答
2

您可以使用仅用于此类事情的词位优先级,示例Marpa::R2 测试套件中。

基本上,你声明<AND keyword> ~ 'AND'lexeme 并给它优先级 1 以便它优先于Identifier. 那一定能解决问题。

PS 我稍微修改了上面的脚本来举个例子——代码输出

于 2014-11-25T06:35:05.660 回答