我编写了一个脚本,将大块的文本发送给 Google 进行翻译,但有时文本(即 html 源代码)最终会在 html 标记中间分裂,而 Google 会错误地返回代码。
我已经知道如何将字符串拆分成一个数组,但是有没有更好的方法来做到这一点,同时确保输出字符串不超过 5000 个字符并且不会在标签上拆分?
更新:感谢回答,这是我最终在项目中使用的代码,效果很好
function handleTextHtmlSplit($text, $maxSize) {
//our collection array
$niceHtml[] = '';
// Splits on tags, but also includes each tag as an item in the result
$pieces = preg_split('/(<[^>]*>)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
//the current position of the index
$currentPiece = 0;
//start assembling a group until it gets to max size
foreach ($pieces as $piece) {
//make sure string length of this piece will not exceed max size when inserted
if (strlen($niceHtml[$currentPiece] . $piece) > $maxSize) {
//advance current piece
//will put overflow into next group
$currentPiece += 1;
//create empty string as value for next piece in the index
$niceHtml[$currentPiece] = '';
}
//insert piece into our master array
$niceHtml[$currentPiece] .= $piece;
}
//return array of nicely handled html
return $niceHtml;
}