asp.net-mvc-3 - 在 Razor MVC 3 中删除 HTML 格式

Question

我正在使用 MVC 3 和 Razor View 引擎。

我想要做什么

我正在使用 MVC 3 制作博客，我想删除所有 HTML 格式标记<p> <b> <i>等。

我正在使用以下代码。（它确实有效）

 @{
 post.PostContent = post.PostContent.Replace("<p>", " ");   
 post.PostContent = post.PostContent.Replace("</p>", " ");
 post.PostContent = post.PostContent.Replace("<b>", " ");
 post.PostContent = post.PostContent.Replace("</b>", " ");
 post.PostContent = post.PostContent.Replace("<i>", " ");
 post.PostContent = post.PostContent.Replace("</i>", " ");
 }

我觉得肯定有更好的方法来做到这一点。任何人都可以请指导我。

score 23 · Accepted Answer

谢谢亚历克斯·雅罗舍维奇，

这是我现在使用的..

post.PostContent = Regex.Replace(post.PostContent, @"<[^>]*>", String.Empty);

score 2 · Accepted Answer

正则表达式很慢。使用它，它更快：

public static string StripHtmlTagByCharArray(string htmlString)
{
    char[] array = new char[htmlString.Length];
    int arrayIndex = 0;
    bool inside = false;

    for (int i = 0; i < htmlString.Length; i++)
    {
        char let = htmlString[i];
        if (let == '<')
        {
            inside = true;
            continue;
        }
        if (let == '>')
        {
            inside = false;
            continue;
        }
        if (!inside)
        {
            array[arrayIndex] = let;
            arrayIndex++;
        }
    }
    return new string(array, 0, arrayIndex);
}

你可以看看http://www.dotnetperls.com/remove-html-tags

score 0 · Accepted Answer

以防万一您想在 .NET 中使用正则表达式来去除 HTML 标记，以下内容似乎在此页面的源代码上运行良好。它比此页面上的其他一些答案更好，因为它会查找实际的 HTML 标签，而不是盲目地删除和之间的所有<内容>。回到 BBS 时代，我们输入<grin>了很多而不是:)，因此删除<grin>不是一种选择。:)

此解决方案仅删除标签。它不会在可能很重要的情况下删除这些标签的内容——例如，脚本标签。您会看到脚本，但脚本不会执行，因为脚本标签本身已被删除。删除 HTML 标记的内容非常棘手，实际上要求HTML 片段格式正确......

还要注意RegexOption.Singleline选项。这对于任何 HTML 块都非常重要。因为在一行中打开 HTML 标签并在另一行中关闭它并没有错。

string strRegex = @"</{0,1}(!DOCTYPE|a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|big|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|main|map|mark|menu|menuitem|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr){1}(\s*/{0,1}>|\s+.*?/{0,1}>)";
Regex myRegex = new Regex(strRegex, RegexOptions.Singleline);
string strTargetString = @"<p>Hello, World</p>";
string strReplace = @"";

return myRegex.Replace(strTargetString, strReplace);

我并不是说这是最好的答案。这只是一个选择，对我来说效果很好。

asp.net-mvc-3 - 在 Razor MVC 3 中删除 HTML 格式

3 回答 3

Related

Reference