php - 如果不完整，则删除 HTML 实体

Question

我有一个问题，我显示了从数据库中提取的最多 400 个字符的字符串，但是，此字符串需要包含 HTML 实体。

偶然地，客户端创建了字符串以使第 400 个字符正好位于结束 P 标记的中间，从而杀死该标记，导致其后的代码出现其他错误。

我希望这个结束的 P 标记被完全删除，因为我在末尾附加了一个“...阅读更多”链接，如果附加到现有段落，它看起来会更干净。

涵盖所有 HTML 实体问题的最佳方法是什么？是否有一个 PHP 函数可以自动关闭/删除任何错误的 HTML 标签？我不需要编码的答案，只需一个方向就会有很大帮助。

谢谢。

score 4 · Accepted Answer

这是一种使用 DOMDocument 的简单方法，它并不完美，但可能很有趣：

<?php 
function html_tidy($src){
    libxml_use_internal_errors(true);
    $x = new DOMDocument;
    $x->loadHTML('<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />'.$src);
    $x->formatOutput = true;
    $ret = preg_replace('~<(?:!DOCTYPE|/?(?:html|body|head))[^>]*>\s*~i', '', $x->saveHTML());
    return trim(str_replace('<meta http-equiv="Content-Type" content="text/html;charset=utf-8">','',$ret));
}

$brokenHTML[] = "<p><span>This is some broken html</spa";
$brokenHTML[] = "<poken html</spa";
$brokenHTML[] = "<p><span>This is some broken html</spa</p>";

/*
<p><span>This is some broken html</span></p>
<poken html></poken>
<p><span>This is some broken html</span></p>
*/
foreach($brokenHTML as $test){
    echo html_tidy($test);
}

?>

虽然注意Mike 'Pomax' Kamermans'的评论。

score 0 · Accepted Answer

只需删除最后一个损坏的标签，然后 strip_tags

$str = "<p>this is how we do</p";
$str = substr($str, 0, strrpos($str, "<"));
$str = strip_tags($str);

score 0 · Accepted Answer

为什么你不把段落或内容中的最后一个词去掉，如果单词完整你去掉它，如果不完整你也去掉它，你确定内容仍然干净，我给你看一个示例代码将如下所示：

while($row = $req->fetch(PDO::FETCH_OBJ){
  //extract 400 first characters from the content you need to show
  $extraction = substr($row->text, 0, 400);
  // find the last space in this extraction
  $last_space = strrpos($extraction, ' ');
  //take content from the first character to the last space and add (...)
  echo substr($extraction, 0, $last_space) . ' ...';
}

php - 如果不完整，则删除 HTML 实体

3 回答 3

Related

Reference