我从@Qtax 看到了这个建议:
preg_match_all (PHP) 中的 UTF-8 字符
为了获得更多参考,这个错误在使用这个时浮出水面:
Truncate text contains HTML, ignoring tags
改变的要点是这样的:
$orig_utf = 'UTF-8';
$new_utf = 'UTF-32';
mb_regex_encoding( $new_utf );
$html = mb_convert_encoding( $html, $new_utf, $orig_utf );
$end_char = mb_convert_encoding( $end_char, $new_utf, $orig_utf );
mb_ereg_search_init( $html );
$pattern = '</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;';
$pattern = mb_convert_encoding( $pattern, $new_utf, $orig_utf );
while ( $printed < $limit && $tag_match = mb_ereg_search_pos( $pattern, $html ) ) {
$tag_position = $tag_match[0]/4;
$tag_length = $tag_match[1];
$tag = mb_substr( $html, $tag_position, $tag_length/4, $new_utf );
$tag_name = preg_replace( '/[\s<>\/]+/', '', $tag );
// Print text leading up to the tag.
$str = mb_substr($html, $position, $tag_position - $position, $new_utf );
.......
}
此外,关于截断 HTML 页面,还有其他必要的更改:
$first_char = mb_substr( $tag, 0, 1, $new_utf );
if ( $first_char == mb_convert_encoding( '&', $new_utf ) ) {
...
}
我的文本编辑器是 UTF-8,所以如果我将 32 与文件的 & 符号进行比较,它将无法工作。