编辑
我重写了代码!它现在包含如下所列的更改。此外,我已经进行了广泛的测试(我不会在这里发布,因为它们太多了)来查找错误。到目前为止,我还没有找到任何东西。
该函数现在分为两部分:有一个单独的函数preg_split
,它接受一个正则表达式并返回一个包含裸表达式(不带分隔符)的数组和一个修饰符数组。这可能会派上用场(事实上,它已经派上用场了;这就是我进行此更改的原因)。
该代码现在可以正确处理反向引用。毕竟,这对我的目的来说是必要的。不难添加,用于捕获反向引用的正则表达式看起来很奇怪(实际上可能效率极低,对我来说看起来 NP 很难——但这只是一种直觉,只适用于奇怪的边缘情况) . 顺便说一句,有没有人知道比我的方法更好的检查奇数匹配的方法?否定的lookbehinds在这里不起作用,因为它们只接受固定长度的字符串而不是正则表达式。但是,我需要这里的正则表达式来测试前面的反斜杠是否真的被转义了。
此外,我不知道 PHP 在缓存匿名create_function
使用方面有多好。就性能而言,这可能不是最好的解决方案,但似乎已经足够好了。
我已经修复了健全性检查中的一个错误。
由于我的测试表明没有必要,我已经删除了对过时修饰符的取消。
顺便说一句,这段代码是我在 PHP 中使用的各种语言的语法高亮器的核心组件之一,因为我对其他地方列出的替代方案不满意。
谢谢!
porneL,无眼睑,惊人的工作!非常感谢。我其实已经放弃了。
我已经建立在您的解决方案之上,我想在这里分享它。我没有实现重新编号反向引用,因为这与我的情况无关(我认为......)。不过,也许这将在以后变得必要。
一些问题……</h2>
一件事,@eyelidlessness:为什么你觉得有必要取消旧的修饰符?据我所知,这不是必需的,因为无论如何修饰符仅在本地应用。啊,是的,另一件事。您对分隔符的转义似乎过于复杂。愿意解释为什么你认为这是必要的吗?我相信我的版本应该也可以,但我可能错了。
此外,我已经更改了您的函数的签名以符合我的需要。我还认为我的版本更普遍有用。再说一次,我可能错了。
顺便说一句,您现在应该意识到实名对 SO 的重要性。;-) 我不能在代码中给你真正的功劳。:-/
编码
无论如何,我想分享我到目前为止的结果,因为我无法相信没有其他人需要这样的东西。该代码似乎运行良好。不过,尚未进行广泛的测试。 请给出意见!
事不宜迟……</p>
/**
* Merges several regular expressions into one, using the indicated 'glue'.
*
* This function takes care of individual modifiers so it's safe to use
* <em>different</em> modifiers on the individual expressions. The order of
* sub-matches is preserved as well. Numbered back-references are adapted to
* the new overall sub-match count. This means that it's safe to use numbered
* back-refences in the individual expressions!
* If {@link $names} is given, the individual expressions are captured in
* named sub-matches using the contents of that array as names.
* Matching pair-delimiters (e.g. <code>"{…}"</code>) are currently
* <strong>not</strong> supported.
*
* The function assumes that all regular expressions are well-formed.
* Behaviour is undefined if they aren't.
*
* This function was created after a {@link https://stackoverflow.com/questions/244959/
* StackOverflow discussion}. Much of it was written or thought of by
* “porneL” and “eyelidlessness”. Many thanks to both of them.
*
* @param string $glue A string to insert between the individual expressions.
* This should usually be either the empty string, indicating
* concatenation, or the pipe (<code>|</code>), indicating alternation.
* Notice that this string might have to be escaped since it is treated
* like a normal character in a regular expression (i.e. <code>/</code>)
* will end the expression and result in an invalid output.
* @param array $expressions The expressions to merge. The expressions may
* have arbitrary different delimiters and modifiers.
* @param array $names Optional. This is either an empty array or an array of
* strings of the same length as {@link $expressions}. In that case,
* the strings of this array are used to create named sub-matches for the
* expressions.
* @return string An string representing a regular expression equivalent to the
* merged expressions. Returns <code>FALSE</code> if an error occurred.
*/
function preg_merge($glue, array $expressions, array $names = array()) {
// … then, a miracle occurs.
// Sanity check …
$use_names = ($names !== null and count($names) !== 0);
if (
$use_names and count($names) !== count($expressions) or
!is_string($glue)
)
return false;
$result = array();
// For keeping track of the names for sub-matches.
$names_count = 0;
// For keeping track of *all* captures to re-adjust backreferences.
$capture_count = 0;
foreach ($expressions as $expression) {
if ($use_names)
$name = str_replace(' ', '_', $names[$names_count++]);
// Get delimiters and modifiers:
$stripped = preg_strip($expression);
if ($stripped === false)
return false;
list($sub_expr, $modifiers) = $stripped;
// Re-adjust backreferences:
// We assume that the expression is correct and therefore don't check
// for matching parentheses.
$number_of_captures = preg_match_all('/\([^?]|\(\?[^:]/', $sub_expr, $_);
if ($number_of_captures === false)
return false;
if ($number_of_captures > 0) {
// NB: This looks NP-hard. Consider replacing.
$backref_expr = '/
( # Only match when not escaped:
[^\\\\] # guarantee an even number of backslashes
(\\\\*?)\\2 # (twice n, preceded by something else).
)
\\\\ (\d) # Backslash followed by a digit.
/x';
$sub_expr = preg_replace_callback(
$backref_expr,
create_function(
'$m',
'return $m[1] . "\\\\" . ((int)$m[3] + ' . $capture_count . ');'
),
$sub_expr
);
$capture_count += $number_of_captures;
}
// Last, construct the new sub-match:
$modifiers = implode('', $modifiers);
$sub_modifiers = "(?$modifiers)";
if ($sub_modifiers === '(?)')
$sub_modifiers = '';
$sub_name = $use_names ? "?<$name>" : '?:';
$new_expr = "($sub_name$sub_modifiers$sub_expr)";
$result[] = $new_expr;
}
return '/' . implode($glue, $result) . '/';
}
/**
* Strips a regular expression string off its delimiters and modifiers.
* Additionally, normalize the delimiters (i.e. reformat the pattern so that
* it could have used '/' as delimiter).
*
* @param string $expression The regular expression string to strip.
* @return array An array whose first entry is the expression itself, the
* second an array of delimiters. If the argument is not a valid regular
* expression, returns <code>FALSE</code>.
*
*/
function preg_strip($expression) {
if (preg_match('/^(.)(.*)\\1([imsxeADSUXJu]*)$/s', $expression, $matches) !== 1)
return false;
$delim = $matches[1];
$sub_expr = $matches[2];
if ($delim !== '/') {
// Replace occurrences by the escaped delimiter by its unescaped
// version and escape new delimiter.
$sub_expr = str_replace("\\$delim", $delim, $sub_expr);
$sub_expr = str_replace('/', '\\/', $sub_expr);
}
$modifiers = $matches[3] === '' ? array() : str_split(trim($matches[3]));
return array($sub_expr, $modifiers);
}
PS:我已将这个发布社区 wiki 设为可编辑。你知道这是什么意思 …!