perl - Perl 模式匹配可选标记

Question

我有一个这样的字符串：

$words = "[a] (good|bad) word [for fun]";

在哪里：

[] 内的所有内容都是可选的
并且 (..|..) 内的值是 OR 强制值

因此，上述字符串的可能结果如下：

a good word for fun

a bad word for fun

a good word

a bad Word

good word for fun 

bad word for fun

good word 

bad word

有人可以帮我找到一种方法来提取所有可能的结果（如上面的示例）并将它们存储在数组中吗？

谢谢！

score 2 · Accepted Answer

use warnings;
use strict;
use constant { OPT => 0, OR => 1, FIXED => 2 };

my $words = "[a] (good|bad) word [for fun]";
my @tokens;
# parse input
my @v = grep {$_} split /(\[|\]|\(|\||\))/, $words;
while (my $token = shift @v) {
  if ($token eq '[') {
    push @tokens, [ OPT, shift @v ];
    shift @v; # ]
  } elsif ($token eq '(') {
    my @list;
    do {
      push (@list, [ FIXED, shift @v] );
    } until (shift @v eq ')'); # '|,)'
    push @tokens, [ OR, \@list ];
  }
  else {
    push @tokens, [FIXED, $token];
  }
}
# generate output
my @phrases = ("");
for my $token (@tokens) {
  my @additions;
  if ($token->[0] == OPT) {
    push @additions, $_.$token->[1] for @phrases;
  } elsif ($token->[0] == FIXED) {
    $_ .= $token->[1] for @phrases;
  } elsif ($token->[0] == OR) {
    foreach my $list (@{$token->[1]}) {
      push @additions, $_.$list->[1] for @phrases;
    }   
    @phrases = (); 
  }
  push @phrases, @additions;
}


print "$_\n" for map {s/^\s+//;s/[ ]+/ /g;$_} @phrases;

score 1 · Accepted Answer

I saw this as an opportunity to try using Parse::RecDescent. I don't understand these things very well, so there might have been a better way to write the grammar.

The parser allows me to generate a list of sets of phrases to use. Then, I feed that list of sets to Set::CrossProduct to generate the Cartesian product of sets.

#!/usr/bin/env perl

use strict;
use warnings;

use Parse::RecDescent;
use Set::CrossProduct;

our @list;

my $parser = Parse::RecDescent->new(q{
    List: OptionalPhrase |
          AlternatingMandatoryPhrases |
          FixedPhrase

    OptionalPhrase:
        OptionalPhraseStart
        OptionalPhraseContent
        OptionalPhraseEnd

    OptionalPhraseStart: /\\[/

    OptionalPhraseContent: /[^\\]]+/
        {
            push @::list, [ $item[-1], '' ];
        }

    OptionalPhraseEnd: /\\]/

    AlternatingMandatoryPhrases:
        AlternatingMandatoryPhrasesStart
        AlternatingMandatoryPhrasesContent
        AlternatingMandatoryPhraseEnd

    AlternatingMandatoryPhrasesStart: /\\(/

    AlternatingMandatoryPhrasesContent: /[^|)]+(?:[|][^|)]+)*/
        {
            push @::list, [ split /[|]/, $item[-1] ];
        }

    AlternatingMandatoryPhraseEnd: /\\)/

    FixedPhrase: /[^\\[\\]()]+/
        {
            $item[-1] =~ s/\\A\\s+//;
            $item[-1] =~ s/\s+\z//;
            push @::list, [ $item[-1] ];
        }
});

my $words = "[a] (good|bad) word [for fun]";

1 while defined $parser->List(\$words);

my $iterator = Set::CrossProduct->new(\@list);

while (my $next = $iterator->get) {
    print join(' ', grep length, @$next), "\n";
}

Output:

a good word for fun
a good word
a bad word for fun
a bad word
good word for fun
good word
bad word for fun
bad word

score 1 · Accepted Answer

使用正则表达式，您可以确定“bad word”是否与您的模式“[a] (good|bad) word [for fun]”匹配（作为正则表达式匹配，可能拼写为/(a )?(good|bad) word( for fun)?/）。但听起来你实际上想要做相反的事情，即。从您的模式中生成所有可能的输入。这不是正则表达式可以做的事情。

您应该查看的内容称为permutations。您的模板字符串包含以下部分：

“一个”或什么都没有
“是好是坏”
“ 单词”
“为了好玩”或没有

因此，片段 1 和 2 有两种可能性，片段 3 仅有一种可能性，片段 4 也有两种可能性，为您提供 2 * 2 * 1 * 2 = 8 种可能性。

只需将所有这些可能性存储在一个多维数组中，例如

my $sentence = [["a ", ""], ["good", "bad"], ["word"], ["for fun", ""]];

然后在 CPAN 上查找置换算法或置换模块以找到所有组合。

作为单个排列的示例，“坏词”将表示为：

 my $badword = 
    $sentence->[0]->[0] 
  . $sentence->[1]->[1] 
  . $sentence->[2]->[0] 
  . $sentence->[3]->[0];

perl - Perl 模式匹配可选标记

3 回答 3

Related

Reference