我需要缩短给定的文本(使用不同的编码!) - 例如。最多 140 个字符 - 不触及链接。
例子:
Lorem ipsum dolor sit amet: http://bit.ly/111111 Consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat. http://bit.ly/222222 Sed diam voluptua. At vero eos et accusam et justo duo dolores. http://bit.ly/111111
最终应该是:
Lorem ipsum dolor sit amet: http://bit.ly/111111 Consetetur sadipscing elitr, sed diam nonumy... http://bit.ly/222222 http://bit.ly/111111
我的实际代码与示例在这里: http: //phpfiddle.org/lite/code/er7-sty
function shortenMessage($message,$limit=140,$encoding='utf-8') {
  if (mb_strlen($message,$encoding) <= $limit) return $message;
  echo '<pre><h3>Original message:<br />'.$message.'<hr>';
  # search positions of links
  $reg_exUrl = "/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
  preg_match_all ($reg_exUrl, $message, $links,PREG_OFFSET_CAPTURE);
  echo 'Links found:<br />';
  var_dump($links[0]);
  echo '<hr>';
  $position = array();
  $len = 0;
  # search utf-8 position of links
  foreach ($links[0] as $values) {
    $url = $values[0];
    $offset = $values[1];
    #$pos = mb_strpos($message, $url, $offset, $encoding); # doesnt work
    $pos = mb_strpos($message, $url, 0, $encoding);
    $position[$pos] = $url;
    # delete url from string
    $message = str_replace($url, '', $message);
    $len += mb_strlen($url,$encoding); # sum lenght of urls to cut from maxlenght
  }
  echo 'UTF-8 Positions:<br />';
  var_dump($position);
  echo '<hr>';
  # shorten text
  $maxlenght = $limit - $len - 7; # 7 is a security buffer
  while ($maxlenght < 0) { # too many urls? then cut some...
    array_shift($position);
    $len -= mb_strlen($position[0],$encoding);
    $maxlenght = $limit - $len - 6;
  }
  echo 'UTF-8 Positions shortened:<br />';  
  var_dump($position);
  echo '<hr>';
  $message = mb_substr($message,0,$maxlenght,$encoding).'... ';
  echo 'Shortened message without urls:<br />'; 
  var_dump($message);
  echo '<hr>';
  # re-insert urls at right positions
  $addpos = 0;
  foreach ($position as $pos => $url) {
    $pos += $addpos;
    if ($pos < mb_strlen($message,$encoding)) {
      $message = mb_substr($message,0,$pos,$encoding).$url.mb_substr($message,$pos,mb_strlen($message),$encoding);
    } else {
      $message .= ' '.$url;
    }
    $addpos += mb_strlen($url,$encoding);
  }
  echo 'Shortened message:<br />';
  var_dump($message); 
  echo '<hr>';
  return $message;
}
当文本中只有不同的链接时,它有效,但当一个链接重复时,它会失败。
我已经尝试将 preg_match_all 的位置作为 mb_strpos 的偏移量,但我认为这会失败,因为 preg-match-utf8-problem。
我已经看到了类似缩短文本推文而没有在内部剪切链接,但他们没有处理编码和处理 html 标签......