3

如何删除 PHP 中的所有 HTML 标签,除了<>字符?

//There's other HTML tags, like h1, div, etc.
echo strip_tags('<gone with the wind> <p>a hotest book</p>');

这将返回a hotest book,但我需要保留书名。我需要函数返回<gone with the wind> a hotest book

4

7 回答 7

5

您应该考虑使用&lt;(<) 和&rt;(>)。

于 2013-01-04T15:28:03.790 回答
3

下面将利用 DOM 查找任何不是有效 HTML4 元素的元素,并将其视为书名。然后这些将被列入白名单strip_tags

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($html);

echo strip_tags($html, implode(',', 
    array_map(
        function($error) {
            return '<' . sscanf($error->message, 'Tag %s invalid')[0] . '>';
        },
        libxml_get_errors()
    )
));

在线演示

请注意,任何以有效 HTML 标签开头的书名都将被视为有效 HTML,因此会被删除(例如“证据主体”或“Head First PHP”)。另请注意,它<gone with the wind>被认为是具有“with”、“the”和“wind”属性的元素“gone”。对于有效元素,您可以检查它们是否只有空属性,如果没有,则删除它们,但是当标题仅包含有效元素名称时,这仍然不是 100% 准确。此外,您可以检查结束标记,但我不知道如何使用 DOM 执行此操作(但 XMLParser 可以检测到它们)。

无论如何,为这些书名找出更好的格式,例如使用名称空间或使用与尖括号不同的分隔符,将大大提高您正确执行此操作的机会。

于 2013-01-04T15:44:16.320 回答
1

你也可以更容易地做到这一点。

   <?php
   $string = htmlspecialchars("<gone with the wind>");
   echo strip_tags( "$string <p>a hotest book</p>");
   ?>

这将输出:

   <gone with the wind> a hotest book

在这里演示

于 2013-01-04T15:49:08.217 回答
1

这是一个简单但对您来说并非万无一失的解决方案。

PHP

$data = "<gone with the wind> <p>a hotest book</p>";
$out = preg_replace("/\<\w+\>|\<\/\w+\>/im", "", $data);

var_dump($out);

输出

string '<gone with the wind> a hotest book' (length=34)

会匹配

<p>text</p>
<anything>text</anything>

不匹配

就像之前所说的那样,代码无法知道书名是什么样的。

<img src="url">

虽然,如果您希望您的数据是简单的<p>标签,那么这将起作用。

疯狂的解决方案,我想我会把它扔在那里。

于 2013-01-04T15:40:07.303 回答
0
$string = '<gone with the wind> <p>a hotest book</p>';
$string = strip_tags(preg_replace("/<([\w\s\d]{6,})>/", "&lt;$1&gt;", $string));
$string = html_entity_decode($string);

以上将转换任何超过六个字母的“标签”<>&lt;&gt;允许您使用strip_tags。

您可能需要根据传入的数据对这六个值进行试验。如果你得到一个标签,<article>你可能需要把它推得更高。

于 2013-01-04T15:37:44.173 回答
0

我能想到的最好的事情就是做这样的事情,因为我不知道会使用什么类型的标签,我只是假设所有这些,这应该删除任何有效的 html 标签,而不仅仅是那些看起来可以的标签成为标签。

<?php
$tags = array("!DOCTYPE","a","abbr","acronym","address","applet","area","article","aside","audio","b","base","basefont","bdi","bdo","big","blockquote","body","br","button","canvas","caption","center","cite","code","col","colgroup","command","datalist","dd","del","details","dfn","dir","div","dl","dt","em","embed","fieldset","figcaption","figure","font","footer","form","frame","frameset","h1","h2","h3","h4","h5","h6","head","header","hgroup","hr","html","i","iframe","img","input","ins","kbd","keygen","label","legend","li","link","map","mark","menu","meta","meter","nav","noframes","noscript","object","ol","optgroup","option","output","p","param","pre","progress","q","rp","rt","ruby","s","samp","script","section","select","small","source","span","strike","strong","style","sub","summary","sup","table","tbody","td","textarea","tfoot","th","thead","time","title","tr","track","tt","u","ul","var","video","wbr");

$string = "<gone with the wind> <p>a hotest book</p>";


echo preg_replace("/<(\/|)(".implode("|", $tags).").*>/iU", "", $string);

最终输出如下所示:

<gone with the wind> a hotest book
于 2013-01-04T15:57:59.777 回答
0

您将在这方面不走运,因为您无法知道其中的哪些内容<>是 HTML 标记,哪些是书名。你甚至不能写一些东西来寻找看起来像标签但实际上不是有效的 HTML 标签的东西,因为你可能会得到 Monkees 1968 年电影“Head”的记录,<Head>这肯定是一个有效的HTML 标记。

您需要与数据的供应商一起解决这个问题,然后才能使用 PHPstrip_tags函数。

于 2013-01-04T16:22:54.677 回答