3

我有一个代码将输出与数组的值进行比较,并且只用数组中的单词终止操作:

第一个代码(只是一个例子

$myVar = 'essa pizza é muito gostosa, que prato de bom sabor';
$myWords=array(
    array('sabor','gosto','delicia'),
    array('saborosa','gostosa','deliciosa'),
);

foreach($myWords as $words){
    shuffle($words); // randomize the subarray
    // pipe-together the words and return just one match
    if(preg_match('/\K\b(?:'.implode('|',$words).')\b/',$myVar,$out)){
        // generate "replace_pair" from matched word and a random remaining subarray word
        // replace and preserve the new sentence
        $myVar=strtr($myVar,[$out[0]=>current(array_diff($words,$out))]);
    }
}
echo $myVar;

我的问题:

我有第二个代码,它不适用于 rand/shuffle(我不想要 rand,我想要精确的替换,我总是将第 0 列更改为 1),是要始终交换值:

// wrong output: $myVar = "minha irmã alanné é not aquela blnode, elere é a bom plperito";
$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(array("is","é"),
    array("on","no"),
    array("that","aquela"),
    //array("blonde","loira"),
    //array("not","não"),
    array("sister","irmã"), 
    array("my","minha"),
    //array("nothing","nada"),
    array("myth","mito"),
    array("he","ele"),
    array("good","bom"),
    array("ace","perito"),
   // array("here","aqui"), //if [here] it does not exist, it is not to do replacement from the line he=ele = "elere" non-existent word  
); 
$replacements = array_combine(array_column($myWords,0),array_column($myWords,1));
$myVar = strtr($myVar,$replacements);
echo $myVar;
// expected output:  minha irmã alannis é not aquela blonde, here é a bom place
//  avoid replace words slice!

预期输出: minha irmã alannis é not aquela 金发女郎,here é a bom place

    //  avoid replace words slice! always check if the word exists in the array before making the substitution.

alanne , blnode , elere , plperito

它检查输出是否是真实的单词,它们存在于数组 myWords 中,这样可以避免键入错误,例如:

那4个字不是存在的字,是书写错误。你如何为第二个代码做到这一点?

   简而言之,必须通过一个完整的词/键,一个现有的词来进行交换。并且不要使用关键字片段创建一些奇怪的东西!

4

2 回答 2

1

不幸strtr()的是,这项工作的工具是错误的,因为它是“无知的词边界”。要定位整个单词,没有比使用带有单词边界的正则表达式模式更简单的方法了。

此外,为了确保在较短的字符串(可能存在于其他字符串中的字符串)之前匹配较长的字符串,您必须$myWords按字符串长度排序(从降序/最长到最短;仅在必要时使用多字节版本)。

对单词数组进行排序并转换为单个正则表达式模式后,您可以将pattern数组replace输入preg_replace().

代码(演示

$myVar = "my sister alannis is not that blonde, here is a good place";
$myWords=array(
    array("is","é"),
    array("on","no"),
    array("that","aquela"),
    array("sister","irmã"), 
    array("my","minha"),
    array("myth","mito"),
    array("he","ele"),
    array("good","bom"),
    array("ace","perito")
); 
usort($myWords,function($a,$b){return mb_strlen($b[0])<=>mb_strlen($a[0]);});  // sort subarrays by first column multibyte length
// remove mb_ if first column holds no multi-byte characters.  strlen() is much faster.

foreach($myWords as &$words){
    $words[0]='/\b'.$words[0].'\b/i';  // generate patterns using search word, word boundaries, and case-insensitivity
}

//var_export($myWords);
//var_export(array_column($myWords,0));
//var_export(array_column($myWords,1));

$myVar=preg_replace(array_column($myWords,0),array_column($myWords,1),$myVar);
echo $myVar;

输出:

minha irmã alannis é not aquela blonde, here é a bom place

这没有做的是欣赏匹配子串的情况。我的意思是,my并且My都将被minha.

为了适应不同的外壳,您需要使用preg_replace_callback().

这是考虑因素(处理大写首字母单词,而不是所有大写单词):

代码 ( Demo ) <--运行这个来查看替换后保留的原始大小写。

foreach($myWords as $words){
    $myVar=preg_replace_callback(
        $words[0],
        function($m)use($words){
            return ctype_upper(mb_substr($m[0],0,1))?
                mb_strtoupper(mb_substr($words[1],0,1)).mb_strtolower(mb_substr($words[1],1)):
                $words[1];
        },
        $myVar);
}
echo $myVar;
于 2017-11-12T23:19:13.407 回答
1

我以前的方法非常低效。我没有意识到您正在处理多少数据,但是如果我们超过 4000 行,那么效率至关重要(我想我的大脑一直在思考strtr()基于您之前的问题的相关处理)。这是我的新/改进的解决方案,我希望将我以前的解决方案留在尘埃中。

代码:(演示

$myVar = "My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!";
echo "$myVar\n";

$myWords = [
    ["is", "é"],
    ["on", "no"],
    ["that", "aquela"],
    ["sister", "irmã"], 
    ["my", "minha"],
    ["myth", "mito"],
    ["he", "ele"],
    ["good", "bom"],
    ["ace", "perito"],
    ["i", "eu"]  // notice I must be lowercase
];
$translations = array_column($myWords, 1, 0);  // or skip this step and just declare $myWords as key-value pairs

// length sorting is not necessary
// preg_quote() and \Q\E are not used because dealing with words only (no danger of misinterpretation by regex)

$pattern = '/\b(?>' . implode('|', array_keys($translations)) . ')\b/i';  // atomic group is slightly faster (no backtracking)
/* echo $pattern;
   makes: /\b(?>is|on|that|sister|my|myth|he|good|ace)\b/i
   demo: https://regex101.com/r/DXTtDf/1
*/
$translated = preg_replace_callback(
    $pattern,
    function($m) use($translations) {  // bring $translations (lookup) array to function
        $encoding = 'UTF-8';  // default setting
        $key = mb_strtolower($m[0], $encoding);  // standardize keys' case for lookup accessibility
        if (ctype_lower($m[0])) { // treat as all lower
            return $translations[$m[0]];
        } elseif (mb_strlen($m[0], $encoding) > 1 && ctype_upper($m[0])) {  // treat as all uppercase
            return mb_strtoupper($translations[$key], $encoding);
        } else {  // treat as only first character uppercase
            return mb_strtoupper(mb_substr($translations[$key], 0, 1, $encoding), $encoding)  // uppercase first
                   . mb_substr($translations[$key], 1, mb_strlen($translations[$key], $encoding) - 1, $encoding);  // append remaining lowercase
        }
    },
    $myVar
);
    
echo $translated;

输出:

My sister alannis Is not That blonde, here is a good place. I know Ariane is not MY SISTER!
Minha irmã alannis É not Aquela blonde, here é a bom place. Eu know Ariane é not MINHA IRMÃ!

这种方法:

  • 只通过1$myVar次,而不是 1 的每个子数组通过$myWords.
  • 不打扰对查找数组 ( $myWords/ $translations) 进行排序。
  • 不需要正则表达式转义 ( preg_quote()) 或使模式组件成为文字 ( \Q..\E),因为只有单词被翻译。
  • 使用单词边界,以便只替换完整的单词匹配。
  • 使用原子组作为微优化,在拒绝回溯的同时保持准确性。
  • 声明$encoding稳定性/可维护性/可重用性的值。
  • 匹配不区分大小写但替换为区分大小写...如果英文匹配是:
    1. 全小写,替换也是
    2. 全大写(并且大于单个字符),替换也是
    3. 大写(仅多字符串的第一个字符),替换也是
于 2018-01-02T21:58:41.267 回答