php - preg_match 在大文本上比 strpos 快吗？

Question

我目前正在为 PHP 8.1.2 更新为 PHP 5.2.17 编写的非常旧的脚本。有很多文本处理代码块，几乎都是preg_match/preg_match_all。我曾经知道，字符串匹配的 strpos 一直比 preg_match 快，但我决定再检查一次。

代码是：

$c = file_get_contents('readme-redist-bins.txt');
$start = microtime(true);
for ($i=0; $i < 1000000; $i++) { 
    strpos($c, '[SOMEMACRO]');
}
$el = microtime(true) - $start;
exit($el);

和

$c = file_get_contents('readme-redist-bins.txt');
$start = microtime(true);
for ($i=0; $i < 1000000; $i++) { 
    preg_match_all("/\[([a-z0-9-]{0,100})".'[SOMEMACRO]'."/", $c, $pma);
}
$el = microtime(true) - $start;
exit($el);

我拿了 php8.1.2 发行版附带的 readme-redist-bins.txt 文件，大约 30KB。

结果（preg_match_all）：

PHP_8.1.2: 1.2461s
PHP_5.2.17: 11.0701s

结果（strpos）：

PHP_8.1.2: 9.97s
PHP_5.2.17: 0.65s

仔细检查... 在两台机器上尝试了 Windows 和 Linux PHP 构建。

用小文件（200B）尝试了相同的代码

结果（preg_match_all）：

PHP_8.1.2: 0.0867s
PHP_5.2.17: 0.6097s

结果（strpos）：

PHP_8.1.2: 0.0358s
PHP_5.2.17: 0.2484s

现在时间安排好了。

那么，怎么可能 preg_match 在大文本上匹配得更快呢？有任何想法吗？

PS：尝试 PHP_7.2.10 - 结果相同。

score -1 · Accepted Answer

PCRE2 真的很快。它是如此之快，以至于它与 PHP 中的纯字符串处理之间几乎没有任何区别，有时甚至更快。PCRE2 内部使用 JIT 并包含很多优化。它真的很擅长它的工作。

另一方面，strpos优化很差。它在 C 中进行一些简单的字节比较。它不使用并行化/矢量化。对于短针和短草垛，它使用memchr，但对于更长的值，它执行Sunday Algorithm。

对于小型数据集，调用 PCRE2 的开销可能会超过其优化，但对于较大的字符串，或不区分大小写/Unicode 字符串 PCRE2 可能会提供更好的性能。

php - preg_match 在大文本上比 strpos 快吗？

1 回答 1

Related

Reference