php - 未定义的偏移量和变音符号

Question

我正在尝试解析Laotian文本，utf8_ireplace但我得到了

未定义的偏移量通知。

我能看到的一件事是有变音符号。这会导致那个警告吗？或者有人可以告诉我为什么总是老挝语（我正在处理的 6 种语言）？

老挝语和类似语言（如藏语）是否有特殊的处理方式utf8_replace？它是否会引起这些语言中某些字符的注意，这是一个已知问题吗？变音符号是问题还是其他？除了关闭通知报告之外，有谁知道如何不收到通知？

更新：实际上我补充说，在老挝语中，单词之间没有空格，所以你必须分隔字符串，这就是我使用 utf8_replace 的目的，但它在老挝语中失败，即使它似乎适用于泰语。所以我真的试图分解字符串，但由于某种原因，偏移量是未定义的。藏语似乎也有问题，例如“α╜ª”

更新

这里是中心问题：为什么我会收到utf8_replace一些老挝语单词的通知？

(Joomla)

// Iterate through the terms and test if they contain the relevant characters.
for ($i = 0, $n = count($terms); $i < $n; $i++)
{
    $charMatches = array();
    if ($lang === 'zh')
    {
        $charCount = preg_match_all('#[\x{4E00}-\x{9FCF}]#mui', $terms[$i], $charMatches);
    }

    elseif ($lang === 'ja')
    {
        // Kanji (Han), Katakana and Hiragana are each checked
        $charCount = preg_match_all('#[\x{4E00}-\x{9FCF}]#mui', $terms[$i], $charMatches);
        $charCount += preg_match_all('#[\x{3040–\x{309F}]#mui', $terms[$i], $charMatches);
        $charCount += preg_match_all('#[\x{30A0}-\x{30FF}]#mui', $terms[$i], $charMatches);
    }
    elseif ($lang === 'th')
    {
        $charCount = preg_match_all('#[\x{0E00}-\x{0E7F}]#mui', $terms[$i], $charMatches);
    }
    elseif ($lang === 'km')
    {
        $charCount = preg_match_all('#[\x{1780}-\x{17FF}]#mui', $terms[$i], $charMatches);
    }
    elseif ($lang === 'lo')
    {
        $charCount = preg_match_all('#[\x{0E80}-\x{30EFF}]#mui', $terms[$i], $charMatches);
    }
    elseif ($lang === 'my')
    {
        $charCount = preg_match_all('#[\x{1000}-\x{109F}]#mui', $terms[$i], $charMatches);
    }
    elseif ($lang === 'bo')
    {
        $charCount = preg_match_all('#[\x{0F00}-\x{0FFF}]#mui', $terms[$i], $charMatches);
    }
    // Split apart any groups of characters.
    for ($j = 0; $j < $charCount; $j++)
    {
        if (isset($charMatches[0][$j]))
        {
            $tSplit = JString::str_ireplace($charMatches[0][$j], '', $terms[$i], null);

            if (!empty($tSplit))
            {
                $terms[$i] = $tSplit;
            }
            else
            {
                unset($terms[$i]);
            }

            $terms[] = $charMatches[0][$j];
        }
    }
}

// Reset array keys.
$terms = array_values($terms);

score 0 · Accepted Answer

我认为偏移错误可能是指preg_match. 我已经使用regex101.com测试了 'lo' 的正则表达式，它返回了这个错误：

\x{30EFF} Character offset is too large. Reduce it to 4 hexadecimal characters or enable UTF-16 (u-modifier)

The other regexes tested just fine.

php - 未定义的偏移量和变音符号

更新

1 回答 1

Related

Reference