php - 正则表达式与单词边界的匹配过于松散

Question

我有以下代码，我试图使用单词边界精确匹配特定单词，将它们替换为“审查”，然后重建文本，但由于某种原因，正则表达式正在捕获尾部斜杠。为了清楚起见，我已简化为以下测试用例

<?php

$words = array('bad' => "censored");
$text = "bad bading testbadtest badder";
$newtext = "";

foreach( preg_split( "/(\[\/?(?:acronym|background|\*)(?:=.+?)?\]|(^|\W)bad(\W|$))/i", $text, null, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY ) as $section )
{
    if ( isset( $words[ $section ] )  )
    {
        $newtext .= $words[ $section ];
    }
    else
    {
        $newtext .= $section ;
    }
}

var_dump($newtext);

出口;

在这个例子中，我期望匹配“bad”，但不匹配 testbadtest 或 badder。问题是“坏”（注意尾随空格）正在匹配，它不作为 $words 数组中的键存在。

有人可以解释我可能会出错的地方吗？

提前致谢

score 0 · Accepted Answer

我想我会采取不同的方法，因为我不确定你为什么preg_split()在正则表达式中使用和硬编码你的审查词。

只需构建一组您想要替换的模式以及它们的替换和使用preg_replace()。

// note no space in words or their replacements
$word_replacement_map = array(
    'bad' => 'b*d',
    'alsobad' => 'a*****d'
);
$bad_words = array_keys($word_replacement_map);
$patterns = array_map(function($item) {
    return '/\b' . preg_quote($item) . '\b/u';
}, $bad_words);
$replacements = array_values($replacement_map);
$input_string = 'the string with bad and alsobad words';
$cleaned_string = preg_replace($patterns, $replacements, $input_string);
var_dump($cleaned_string); // the string with b*d and a*****d words

请注意，如果您不需要特定于单词的替换，您可以简单地将其归结为：

// note no space in words
$bad_words = array(
    'bad',
    'alsobad'
);
$replacement = 'censored';
$patterns = array_map(function($item) {
    return '/\b' . preg_quote($item) . '\b/u';
}, $bad_words);
$input_string = 'the string with bad and alsobad words';
$cleaned_string = preg_replace($patterns, $replacement, $input_string);
var_dump($cleaned_string); // the string with censored and censored words

请注意，我在正则表达式模式中使用单词边界，这通常应该满足您的需求。

php - 正则表达式与单词边界的匹配过于松散

1 回答 1

Related

Reference