3

以下代码在通过 CLI 和 Apache/mod_php 运行时会产生不同的结果:

<pre>
<?php
error_reporting(E_ALL);
ini_set('display_errors', '1');

echo setlocale(LC_ALL, 0)."\n";
// echo setlocale(LC_ALL, "en_GB.UTF-8")."\n";

$terms = array
(
    //Always matches:
    "Label Generation",
    //Doesn't match when using u (PCRE_UTF8) modifier:
    "Receipt of Prescription and Validation of Patient Information",

);

$text       = "Some terms to match: ".implode(", ",$terms);
$pattern    = "/(".implode(")|(", $terms).")/is";
$regexps    = array
(
   "Unicode"     => $pattern."u", //Add u (PCRE_UTF8) modifier
   "Non-unicode" => $pattern
);

echo "Text:\n'$text'\n";

foreach($regexps as $type=>$regexp)
{
    $matches    = array();
    $total      = preg_match_all($regexp,$text,$matches);

    echo "\n\n";
    echo "$type regex:\n'$regexp'\n\n";
    echo "Total $type matches: ";
    var_dump($total);
    echo "\n$type matches: ";
    var_dump($matches[0]);
}
?>
</pre>

CLI 输出(正确):

<pre>
/en_GB.UTF-8/C/C/C/C/C
Text:
'Some terms to match: Label Generation, Receipt of Prescription and Validation of Patient Information'


Unicode regex:
'/(Label Generation)|(Receipt of Prescription and Validation of Patient Information)/isu'

Total Unicode matches: int(2)

Unicode matches: array(2) {
  [0]=>
  string(16) "Label Generation"
  [1]=>
  string(61) "Receipt of Prescription and Validation of Patient Information"
}


Non-unicode regex:
'/(Label Generation)|(Receipt of Prescription and Validation of Patient Information)/is'

Total Non-unicode matches: int(2)

Non-unicode matches: array(2) {
  [0]=>
  string(16) "Label Generation"
  [1]=>
  string(61) "Receipt of Prescription and Validation of Patient Information"
}
</pre>

Apache/mod_php 网络服务器结果(不正确 - 仅在不使用 /u 修饰符时匹配字符串):

/en_GB.ISO8859-1/C/C/C/C/C
Text:
'Some terms to match: Label Generation, Receipt of Prescription and Validation of Patient Information'


Unicode regex:
'/(Label Generation)|(Receipt of Prescription and Validation of Patient Information)/isu'

Total Unicode matches: int(1)

Unicode matches: array(1) {
  [0]=>
  string(16) "Label Generation"
}


Non-unicode regex:
'/(Label Generation)|(Receipt of Prescription and Validation of Patient Information)/is'

Total Non-unicode matches: int(2)

Non-unicode matches: array(2) {
  [0]=>
  string(16) "Label Generation"
  [1]=>
  string(61) "Receipt of Prescription and Validation of Patient Information"
}

使用 /u (PCRE_UTF8) 选项时,Web 服务器无法匹配这两个字符串。我尝试setlocale(LC_ALL, "en_GB.UTF-8");将 Web 服务器区域设置与它成功执行的 CLI 区域设置匹配,但它与输出无关。我怀疑 PCRE 库有问题,但我不明白 CLI 和 Web 服务器之间有何不同 - PHP 在两种环境中报告相同的库版本:PHP 5.4.14 PCRE (Perl Compatible Regular Expressions) Support => enabled PCRE 库版本 => 8.32 2012-11-30

pcretest 报告不支持 UTF-8,但尽管如此,CLI 版本会产生正确的结果

$> pcretest -C
PCRE version 8.32 2012-11-30
Compiled with
  8-bit support
  No UTF-8 support
  No Unicode properties support
  No just-in-time compiler support
  Newline sequence is LF
  \R matches all Unicode newlines
  Internal link size = 2
  POSIX malloc threshold = 10
  Default match limit = 10000000
  Default recursion depth limit = 10000000
  Match recursion uses stack
4

3 回答 3

3

这个 PHP 设置帮助了我:

pcre.jit=0 
于 2017-09-03T23:01:01.523 回答
2

Alastair,挖掘了这个古老的问题,因为它处理了历代编码人员感兴趣的永恒问题。

正如 Dino 所说,在同一个盒子上拥有多个版本的 PCRE 是很常见的。我总是对在平均 cPanel 构建上安装多少个版本的 PCRE 感到惊讶。这可能不是您的情况,但您似乎也有多个版本。

要查看安装了哪些 PCRE,请在 unix shell 中键入:

find / -name libpcre.*

如果你想获得一些有意义的信息,你会想要使用 pcretest 你一直在做什么,所以你find / -name pcretest可以somepath/pcretest -C

如果您在 cPanel 上,根据 cPanel 工作人员的说法,EasyApache 安装的 PCRE 版本是该opt/文件夹中的版本。您可以通过运行获取版本

/opt/pcre/bin/pcretest -C

这是一团糟,但这让我们保持警惕。:)

于 2014-04-23T01:21:04.810 回答
0

一些 Linux 发行版(一个是 Ubuntu)将其 PHP 打包为 CLI 和 Apache 的单独 php.ini 文件。如果这是您的情况,那么您可能需要在 /etc/php5 中四处寻找并调查差异。

于 2013-10-25T21:14:16.010 回答