这可能会节省一些时间
贪婪的
([AB]).*(?!\1)[AB]
不贪心
([AB]).*?(?!\1)[AB]
重做
我在这个问题上做了我自己的板凳。一次匹配一个术语/term/
而不是一个正则表达式中的两个术语总是会花费更少的时间,因为它不会
回溯。它就像 strncmp(term) 一样简单。然后分别做 2 个术语会
快得多。
如果您可以定义术语,使得没有重叠的可能性,那么这就是
要走的路。IE; /term1/ && /term2/。
如果不调用回溯,就无法将术语组合成单个正则表达式。
也就是说,如果您真的关心重叠,那么有一些技术可以最大限度地减少
回溯。
/(?=.*A)(?=.*B)/ 就像 /A/ && /B/ 一样,只是它看起来慢了很多,两者都不考虑重叠。
所以,如果你真的关心重叠(我强烈建议你这样做),有
两种方法可以结合起来以获得最大效率。
/(A|B) .* (?!\1)(?:A|B)/
或者
/A/ && /B/ && /(A|B) .* (?!\1)(?:A|B)/
最后一个会增加一个小的(相对的)开销,但会抑制逻辑
链中的访问,要求 A 和 B 至少存在,然后再检查重叠。
而且,根据字符串中 A 和 B 的位置, /(A|B) .* (?!\1)(?:A|B)/
也可能需要时间,但它仍然是最短的时间一切都很
平均。
下面是一个 Perl 程序,它对一些示例(可能的场景)字符串进行基准测试。
祝你好运!
use strict;
use warnings;
use Benchmark ':hireswallclock';
my ($t0,$t1);
my ($term1, $term2) = ('term','m2a');
my @samples = (
' xaaaaaaa term2ater ',
' xaaaaaaa term2aterm ',
' xaaaaaaa ter2ater ',
' Aaa term2ater ' . 'x 'x100 . 'xaaaaaaa mta ',
' Baa term ' . 'x 'x100 . 'xaaaaaaa mta ',
' Caa m2a ' . 'x 'x100 . 'xaaaaaaa term ',
' Daa term2a ' . 'x 'x100 . 'xaaaaaaa term ',
);
my $rxA = qr/$term1/;
my $rxB = qr/$term2/;
my $rxAB = qr/ ($term1|$term2) .* (?!\1)(?:$term1|$term2) /x;
for (@samples)
{
printf "Checking string: '%.40s'\n-------------\n", $_;
if (/$term1/ && /$term2/ ) {
print " Found possible candidates (A && B)\n";
}
if (/ ($term1|$term2) .* ((?!\1)(?:$term1|$term2)) /x) {
print " Found non-overlaped terms: '$1' '$2'\n";
}
else {
print " No (A|B) .* (?!\\1)(A|B) terms found!\n";
}
print "\n Bench\n";
$t0 = new Benchmark;
for my $cnt (1 .. 500_000) {
/$rxA/ && /$rxB/;
}
$t1 = new Benchmark;
print " $rxA && $rxB\n -took: ", timestr(timediff($t1, $t0)), "\n\n";
$t0 = new Benchmark;
for my $cnt (1 .. 500_000) {
/$rxAB/;
}
$t1 = new Benchmark;
print " $rxAB\n -took: ", timestr(timediff($t1, $t0)), "\n\n";
$t0 = new Benchmark;
for my $cnt (1 .. 500_000) {
/$rxA/ && /$rxB/ && /$rxAB/;
}
$t1 = new Benchmark;
print " $rxA && $rxB &&\n $rxAB\n -took: ", timestr(timediff($t1, $t0)), "\n\n";
}
输出
Checking string: ' xaaaaaaa term2ater '
-------------
Found possible candidates (A && B)
No (A|B) .* (?!\1)(A|B) terms found!
Bench
(?-xism:term) && (?-xism:m2a)
-took: 1.46875 wallclock secs ( 1.47 usr + 0.00 sys = 1.47 CPU)
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 3.3748 wallclock secs ( 3.34 usr + 0.00 sys = 3.34 CPU)
(?-xism:term) && (?-xism:m2a) &&
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 5.0623 wallclock secs ( 5.06 usr + 0.00 sys = 5.06 CPU)
Checking string: ' xaaaaaaa term2aterm '
-------------
Found possible candidates (A && B)
Found non-overlaped terms: 'm2a' 'term'
Bench
(?-xism:term) && (?-xism:m2a)
-took: 1.48403 wallclock secs ( 1.49 usr + 0.00 sys = 1.49 CPU)
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 3.89044 wallclock secs ( 3.89 usr + 0.00 sys = 3.89 CPU)
(?-xism:term) && (?-xism:m2a) &&
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 5.40607 wallclock secs ( 5.38 usr + 0.00 sys = 5.38 CPU)
Checking string: ' xaaaaaaa ter2ater '
-------------
No (A|B) .* (?!\1)(A|B) terms found!
Bench
(?-xism:term) && (?-xism:m2a)
-took: 0.765321 wallclock secs ( 0.77 usr + 0.00 sys = 0.77 CPU)
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 1.29674 wallclock secs ( 1.30 usr + 0.00 sys = 1.30 CPU)
(?-xism:term) && (?-xism:m2a) &&
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 0.874842 wallclock secs ( 0.88 usr + 0.00 sys = 0.88 CPU)
Checking string: ' Aaa term2ater x x x x x x x x x x x x x'
-------------
Found possible candidates (A && B)
No (A|B) .* (?!\1)(A|B) terms found!
Bench
(?-xism:term) && (?-xism:m2a)
-took: 1.46842 wallclock secs ( 1.47 usr + 0.00 sys = 1.47 CPU)
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 28.078 wallclock secs (28.08 usr + 0.00 sys = 28.08 CPU)
(?-xism:term) && (?-xism:m2a) &&
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 29.4531 wallclock secs (29.45 usr + 0.00 sys = 29.45 CPU)
Checking string: ' Baa term x x x x x x x x x x x x x'
-------------
No (A|B) .* (?!\1)(A|B) terms found!
Bench
(?-xism:term) && (?-xism:m2a)
-took: 1.68716 wallclock secs ( 1.69 usr + 0.00 sys = 1.69 CPU)
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 15.1563 wallclock secs (15.16 usr + 0.00 sys = 15.16 CPU)
(?-xism:term) && (?-xism:m2a) &&
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 1.64033 wallclock secs ( 1.64 usr + 0.00 sys = 1.64 CPU)
Checking string: ' Caa m2a x x x x x x x x x x x x x'
-------------
Found possible candidates (A && B)
Found non-overlaped terms: 'm2a' 'term'
Bench
(?-xism:term) && (?-xism:m2a)
-took: 1.62448 wallclock secs ( 1.63 usr + 0.00 sys = 1.63 CPU)
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 3.0154 wallclock secs ( 3.02 usr + 0.00 sys = 3.02 CPU)
(?-xism:term) && (?-xism:m2a) &&
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 4.56226 wallclock secs ( 4.56 usr + 0.00 sys = 4.56 CPU)
Checking string: ' Daa term2a x x x x x x x x x x x '
-------------
Found possible candidates (A && B)
Found non-overlaped terms: 'm2a' 'term'
Bench
(?-xism:term) && (?-xism:m2a)
-took: 1.45252 wallclock secs ( 1.45 usr + 0.00 sys = 1.45 CPU)
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 16.1404 wallclock secs (16.14 usr + 0.00 sys = 16.14 CPU)
(?-xism:term) && (?-xism:m2a) &&
(?x-ism: (term|m2a) .* (?!\1)(?:term|m2a) )
-took: 17.6719 wallclock secs (17.67 usr + 0.00 sys = 17.67 CPU)