17

I'm trying to munge a simple grammar with a perl regex (note this isn’t intended for production use, just a quick analysis for providing editor hints/completions). For instance,

my $GRAMMAR = qr{(?(DEFINE)
  (?<expr> \( (?&expr) \) | (?&number) | (?&var) | (?&expr) (?&op) (?&expr) )
  (?<number> \d++ )
  (?<var> [a-z]++ )
  (?<op> [-+*/] )
)}x;

I would like to be able to run this as

$expr =~ /$GRAMMAR(?&expr)/;

and then access all the variable names. However, according to perlre,

Note that capture groups matched inside of recursion are not accessible after the recursion returns, so the extra layer of capturing groups is necessary. Thus $+{NAME_PAT} would not be defined even though $+{NAME} would be.

So apparently this is not possible. I could try using a (?{ code }) block to save variable names to a hash, but this doesn't respect backtracking (i.e. the assignment’s side effect persists even if the variable is backtracked past).

Is there any way to get everything captured by a given named capture group, including recursive matches? Or do I need to manually dig through the individual pieces (and thus duplicate all the patterns)?

4

2 回答 2

9

必须添加捕获和回溯机制是Regexp::Grammars解决的缺点之一。

但是,您问题中的语法是left-recursive,Perl 正则表达式和递归下降解析器都不会解析。

使您的语法适应Regexp::Grammars并分解左递归产生

my $EXPR = do {
  use Regexp::Grammars;
  qr{
    ^ <Expr> $

    <rule: Expr>        <Term> <ExprTail>
               |        <Term>

    <rule: Term>        <Number>
               |        <Var>
               |        \( <MATCH=Expr> \)

    <rule: ExprTail>    <Op> <Expr>

    <token: Op>         \+ | \- | \* | \/

    <token: Number>     \d++

    <token: Var>        [a-z]++
  }x;
};

请注意,这个简单的语法赋予所有运算符同等的优先级,而不是 Please Excuse My Dear Aunt Sally。

你想提取所有变量名,所以你可以像这样走 AST

sub all_variables {
  my($root,$var) = @_;

  $var ||= {};
  ++$var->{ $root->{Var} } if exists $root->{Var};
  all_variables($_, $var) for grep ref $_, values %$root;

  wantarray ? keys %$var : [ keys %$var ];
}

并打印结果

if ("(a + (b - c))" =~ $EXPR) {
  print "[$_]\n" for sort +all_variables \%/;
}
else {
  print "no match\n";
}

另一种方法是为Var成功解析变量名称的规则安装自动操作。

package JustTheVarsMaam;

sub new { bless {}, shift }

sub Var {
  my($self,$result) = @_;
  ++$self->{VARS}{$result};
  $result;
}

sub all_variables { keys %{ $_[0]->{VARS} } }

1;

称这个为

my $vars = JustTheVarsMaam->new;
if ("(a + (b - c))" =~ $EXPR->with_actions($vars)) {
  print "[$_]\n" for sort $vars->all_variables;
}
else {
  print "no match\n";
}

无论哪种方式,输出都是

[一个]
[乙]
[C]
于 2013-07-12T01:28:08.780 回答
8

Marpa::R2使用下面 __DATA__ 部分中的 BNF 实现递归:

#!env perl
use strict;
use diagnostics;
use Marpa::R2;

my $input = shift || '(a + (b - c))';

my $grammar_source = do {local $/; <DATA>};
my $recognizer = Marpa::R2::Scanless::R->new
  (
   {
    grammar => Marpa::R2::Scanless::G->new
    (
     {
      source => \$grammar_source,
      action_object => __PACKAGE__,
     }
    )
   },
  );
my %vars = ();
sub new { return bless {}, shift;}
sub varAction { ++$vars{$_[1]}};

$recognizer->read(\$input);
$recognizer->value() || die "No parse";

print join(', ', sort keys %vars)  . "\n";

__DATA__
:start ::= expr
expr ::= NUMBER
       | VAR action => varAction
       | expr OP expr
       | '(' expr ')'
NUMBER ~ [\d]+
VAR ~ [a-z]+
OP ~ [-+*/]
WS ~ [\s]+
:discard ~ WS

输出是:

a, b, c

您的问题仅涉及如何获取变量名称,因此在此答案中没有运算符关联性等概念。请注意,如果需要,玛尔巴对此没有任何问题。

于 2013-07-12T18:51:17.270 回答