regex - perl中正则表达式匹配的奇怪问题，替代尝试匹配

Question

考虑以下 perl 脚本：

 #!/usr/bin/perl

 my $str = 'not-found=1,total-found=63,ignored=2';

 print "1. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
 print "2. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
 print "3. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
 print "4. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);

 print "Bye!\n";

运行后的输出是：

1. matched using regex
3. matched using regex
Bye!

相同的正则表达式匹配一次，之后不会立即匹配。知道为什么在 perl 中使用相同正则表达式匹配相同字符串的替代尝试失败了吗？

谢谢！

score 5 · Accepted Answer

这是您的代码不起作用的详细解释。

修饰符将/g正则表达式的行为更改为“全局匹配”。这将匹配字符串中所有出现的模式。但是，如何进行这种匹配取决于context。Perl 中的两个（主要）上下文是列表上下文（复数）和标量上下文（单数）。

在list context 中，全局正则表达式匹配返回所有匹配子字符串的列表，或所有匹配捕获的平面列表：

my $_ = "foobaa";
my $regex = qr/[aeiou]/;

my @matches = /$regex/g; # match all vowels
say "@matches"; # "o o a a"

在标量上下文中，匹配似乎返回一个 perl 布尔值，描述正则表达式是否匹配：

my $match = /$regex/g;
say $match; # "1" (on failure: the empty string)

但是，正则表达式变成了迭代器。每次执行正则表达式匹配时，正则表达式都会从字符串中的当前位置开始，并尝试匹配。如果匹配，则返回 true。如果匹配失败，那么

匹配返回 false，并且
字符串中的当前位置设置为开始。

因为字符串中的位置被重置，下一次匹配将再次成功。

my $match;
say $match while $match = /$regex/g;
say "The match returned false, or the while loop would have go on forever";
say "But we can match again" if /$regex/g;

第二个效果 - 重置位置 - 可以使用附加/c标志取消。

可以使用pos函数访问字符串中的位置：pos($string)返回当前位置，可以设置为pos($string) = 0.

正则表达式也可以\G在当前位置使用断言锚定，就像^在字符串开头锚定正则表达式一样。

这种m//gc风格的匹配使得编写分词器变得容易：

my @tokens;
my $_ = "1, abc, 2 ";
TOKEN: while(pos($_) < length($_)) {
  /\G\s+/gc and next; # skip whitespace
  # if one of the following matches fails, the next token is tried
  if    (/\G(\d+)/gc) { push @tokens, [NUM => $1]}
  elsif (/\G,/gc    ) { push @tokens, ['COMMA'  ]}
  elsif (/\G(\w+)/gc) { push @tokens, [STR => $1]}
  else { last TOKEN } # break the loop only if nothing matched at this position.
}
say "[@$_]" for @tokens;

输出：

[NUM 1]
[COMMA]
[STR abc]
[COMMA]
[NUM 2]

score 3 · Accepted Answer

摆脱m并g作为正则表达式的修饰符，它们没有做你想做的事。

print "1. matched using regex\n" if ($str =~ /total-found=(\d+)/);
print "2. matched using regex\n" if ($str =~ /total-found=(\d+)/);
print "3. matched using regex\n" if ($str =~ /total-found=(\d+)/);
print "4. matched using regex\n" if ($str =~ /total-found=(\d+)/);

具体来说，mis optional 在这种情况下m/foo/与完全相同/foo/。真正的问题是g在这种情况下会做很多你不想要的事情。有关详细信息，请参阅perlretut。

score 1 · Accepted Answer

 my $str = 'not-found=1,total-found=63,ignored=2';

 print "1. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);

匹配total-found=63和pos($str)下一次匹配尝试设置为偏移量 26。

 print "2. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);

匹配nothing，因此pos($str)重置为偏移量 0。

这就是为什么

 print "3. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);

再次匹配total-found=63，pos($str)下一次匹配尝试再次设置为偏移 26，这就是为什么

 print "4. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);

像第二个一样再次失败，重新设置pos($str)为偏移量 0。

 print "Bye!\n";

regex - perl中正则表达式匹配的奇怪问题，替代尝试匹配

3 回答 3

Related

Reference