2

我的搜索文本如下。

...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...

它包含许多行(实际上是一个 javascript 文件)但需要解析变量字符串中的值,即 aaa 、 bbb 、 ccc 、 ddd 、 eee

以下是 Perl 代码,或者在底部使用 PHP

my $str = <<STR;
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR

my @matches = $str =~ /(?:\"(.+?)\",?)/g;
print "@matches";

我知道上面的脚本会匹配所有的瞬间,但它也会解析其他行中的字符串(“xyz”)。所以我需要检查字符串var strings =

/var strings = \[(?:\"(.+?)\",?)/g

使用上面的正则表达式它将解析aaa

/var strings = \[(?:\"(.+?)\",?)(?:\"(.+?)\",?)/g

使用上面,将得到aaabbb。因此,为了避免正则表达式重复,我使用了 '+' 量词,如下所示。

/var strings = \[(?:\"(.+?)\",?)+/g

但我只有eee,所以我的问题是为什么我只在使用 '+' 量词时才得到eee ?

更新 1:使用 PHP preg_match_all(这样做是为了获得更多关注 :-))

$str = <<<STR
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR;

preg_match_all("/var strings = \[(?:\"(.+?)\",?)+/",$str,$matches);
print_r($matches);

更新 2:为什么它匹配eee?因为贪心(?:\"(.+?)\",?)+。通过消除贪婪, /var strings = \[(?:\"(.+?)\",?)+?/ aaa将被匹配。但为什么只有一个结果?有没有什么办法可以通过使用单个正则表达式来实现?

4

3 回答 3

2

这是一个单一的正则表达式解决方案:

/(?:\bvar\s+strings\s*=\s*\[|\G,)\s*"([^"]*)"/g

\G是一个零宽度断言,它匹配上一个匹配结束的位置(或者如果它是第一次匹配尝试,则为字符串的开头)。所以这就像:

var\s+strings\s*=\s*[\s*"([^"]*)"

...在第一次尝试时,然后:

,\s*"([^"]*)"

...在那之后,但每场比赛都必须从最后一场比赛结束的地方开始。

这是一个PHP 演示,但它也可以在 Perl 中运行。

于 2012-07-19T12:08:34.983 回答
2

您可能更喜欢这种首先var strings = [使用/g修饰符查找字符串的解决方案。这将设置为在下一个正则表达式\G之后立即匹配[,该正则表达式查找所有紧随其后出现的双引号字符串,可能前面有逗号或空格。

my @matches;

if ($str =~ /var \s+ strings \s* = \s* \[ /gx) {
  @matches = $str =~ /\G [,\s]* "([^"]+)" /gx;
}

尽管使用了/g修饰符,但您的正则表达式/var strings = \[(?:\"(.+?)\",?)+/g只匹配一次,因为没有第二次出现var strings = [. 当匹配完成时,每个匹配项都会返回捕获变量 、 等的值列表,并且(无需$1转义双引号)捕获多个值,只留下最终值。您需要编写类似上述的内容,它只为每个匹配捕获一个值。$2$3/(?:"(.+?)",?)+/$1$1

于 2012-07-19T14:39:09.197 回答
1

因为它+告诉它重复括号内的确切内容(?:"(.+?)",?)一次或多次。所以它将匹配"eee"字符串,然后查找该"eee"字符串的重复,它没有找到。

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/var strings = \[(?:"(.+?)",?)+/)->explain();

The regular expression:

(?-imsx:var strings = \[(?:"(.+?)",?)+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  var strings =            'var strings = '
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (?:                      group, but do not capture (1 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    (                        group and capture to \1:
----------------------------------------------------------------------
      .+?                      any character except \n (1 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
    )                        end of \1
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    ,?                       ',' (optional (matching the most amount
                             possible))
----------------------------------------------------------------------
  )+                       end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

一个更简单的例子是:

my @m = ('abcd' =~ m/(\w)+/g);
print "@m";

仅打印d。这是因为:

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/(\w)+/)->explain();

The regular expression:

(?-imsx:(\w)+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1 (1 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
  )+                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

如果您在捕获组上使用量词,则只会使用最后一个实例。


这是一种有效的方法:

my $str = <<STR;
    ...
    ...
    var strings = ["aaa","bbb","ccc","ddd","eee"];
    ...
    ...
STR

my @matches;
$str =~ m/var strings = \[(.+?)\]/; # get the array first
my $jsarray = $1;
@matches = $array =~ m/"(.+?)"/g; # and get the strings from that

print "@matches";

更新:单行解决方案(尽管不是单个正则表达式)将是:

@matches = ($str =~ m/var strings = \[(.+?)\]/)[0] =~ m/"(.+?)"/g;

但这是非常难以理解的恕我直言。

于 2012-07-19T11:20:46.220 回答