php - 使用正则表达式从 html 中排除标题

Question

我正在获取页面 html 代码。

我可以删除所有 html 标记和脚本。也想删<title> whatever here </html>

在 SO 上测试了所有解决方案。没有帮助

这里有什么问题？

function plaintext($html)
    {
        $plaintext = preg_replace('#([<]title)(.*)([<]/title[>])#', ' ', $html);


            //$plaintext = preg_match('#<title>(.*?)</title>#', $html);

        // remove comments and any content found in the the comment area (strip_tags only removes the actual tags).
        $plaintext = preg_replace('#<!--.*?-->#s', '', $plaintext);

        // put a space between list items (strip_tags just removes the tags).
            $plaintext = preg_replace('#</li>#', ' </li>', $plaintext);     

            // remove all script and style tags
        $plaintext = preg_replace('#<(script|style)\b[^>]*>(.*?)</(script|style)>#is', "", $plaintext);

        // remove br tags (missed by strip_tags)
            $plaintext = preg_replace("#<br[^>]*?>#", " ", $plaintext);

            // remove all remaining html
            $plaintext = strip_tags($plaintext);

        return $plaintext;
    }

score 0 · Accepted Answer

你的代码看起来不错。我的意思是，使用你的功能..

function plaintext($html)
{
    $plaintext = preg_replace('#([<]title)(.*)([<]/title[>])#', ' ', $html);

    return $plaintext;
}
$page = file_get_contents('http://www.example.com');
echo plaintext($page);

没有标题标签...

score 0 · Accepted Answer

0

尝试：

preg_replace('/<title\b[^>]*>(.*?)</title>/i','',$html);

于 2013-11-06T19:01:55.503 回答

php - 使用正则表达式从 html 中排除标题

2 回答 2

Related

Reference