php - 合并两个正则表达式以截断字符串中的单词

Question

我正在尝试提出以下将字符串截断为整个单词的函数（如果可能，否则它应该截断为字符）：

function Text_Truncate($string, $limit, $more = '...')
{
    $string = trim(html_entity_decode($string, ENT_QUOTES, 'UTF-8'));

    if (strlen(utf8_decode($string)) > $limit)
    {
        $string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)~su', '$1', $string);

        if (strlen(utf8_decode($string)) > $limit)
        {
            $string = preg_replace('~^(.{' . intval($limit) . '}).*~su', '$1', $string);
        }

        $string .= $more;
    }

    return trim(htmlentities($string, ENT_QUOTES, 'UTF-8', true));
}

以下是一些测试：

// Iñtërnâtiônàlizætiøn and then the quick brown fox... (49 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn and then the quick brown fox jumped overly the lazy dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

// Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_...  (50 + 3 chars)
echo dyd_Text_Truncate('Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog and one day the lazy dog humped the poor fox down until she died.', 50, '...');

它们都按原样工作，但是如果我放弃第二个，preg_replace()我会得到以下信息：

Iñtërnâtiônàlizætiøn_and_then_the_quick_brown_fox_jumped_overly_the_lazy_dog 有一天，这只懒惰的狗把这只可怜的狐狸撞倒了，直到她死去......

我不能使用substr()，因为它只适用于字节级别并且我无权访问mb_substr()ATM，我已经多次尝试将第二个正则表达式与第一个正则表达式一起加入，但没有成功。

请帮助 SMS，我已经为此苦苦挣扎了将近一个小时。

编辑：对不起，我已经醒了 40 个小时，我无耻地错过了这个：

$string = preg_replace('~^(.{1,' . intval($limit) . '})(?:\s.*|$)?~su', '$1', $string);

不过，如果有人有更优化的正则表达式（或忽略尾随空格的），请分享：

"Iñtërnâtiônàlizætiøn and then "
"Iñtërnâtiônàlizætiøn_and_then_"

编辑2：我仍然无法摆脱尾随空格，有人可以帮助我吗？

编辑 3：好的，我的编辑都没有真正起作用，我被 RegexBuddy 愚弄了——我可能应该把它留到另一天，现在睡一觉。今天休息。

score 3 · Accepted Answer

经过一夜的 RegExp 噩梦之后，也许我可以给你一个快乐的早晨：

'~^(.{1,' . intval($limit) . '}(?<=\S)(?=\s)|.{'.intval($limit).'}).*~su'

把它煮沸：

^      # Start of String
(       # begin capture group 1
 .{1,x} # match 1 - x characters
 (?<=\S)# lookbehind, match must end with non-whitespace 
 (?=\s) # lookahead, if the next char is whitespace, match
 |      # otherwise test this:
 .{x}   # got to x chars anyway.
)       # end cap group
.*     # match the rest of the string (since you were using replace)

您始终可以将 the 添加|$到末尾，(?=\s)但是由于您的代码已经在检查字符串长度是否比长$limit，所以我认为这种情况是不必要的。

score 0 · Accepted Answer

0

您是否考虑过使用自动换行？( http://us3.php.net/wordwrap )

于 2010-04-22T08:09:49.500 回答

php - 合并两个正则表达式以截断字符串中的单词

2 回答 2

Related

Reference