c# - c#剥离html标签，解码实体

Question

是否有 PHP 函数 strip_tags 和 html_entity_decode 的等价物？我正在使用 .NET 3.5

所以如果我有：

<textarea cols="5">Some &lt; text</textarea>

我去拿

Some < text

感谢您的回复。

score 5 · Accepted Answer

您可以使用HtmlAgilityPack ...

string html = @"<textarea cols=""5"">Some &lt; text</textarea>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var text = doc.DocumentNode.Descendants("textarea").First().InnerText;
var decodedText = HttpUtility.HtmlDecode(text);

score 2 · Accepted Answer

2

使用 Regex 替换标签 * <.*?>* 和 HttpUtility 类来解码实体。

于 2012-10-29T19:41:26.550 回答

score 2 · Accepted Answer

我想分享我为此创建的代码。我喜欢 PHP，但我的工作是 C#，所以我重新创建了 StripTag 功能。

如何使用它的示例：

string exampleOneWithAllStripped = StripTag("<br />this is an <b>example</b>", null);

string exampleTwoWithOnlyBoldAllowed = StripTag("<br />this is an <b>example</b>", "b");

string exampleThreeWithBRandBoldAllowed = StripTag("<br />this is an <b>example</b>", "b,<br>");

    /// <summary>
    ///     HTML and other mark up tags stripped from a given the string ListOfAllowedTags.
    ///     This Method is the ASP.NET Version of the PHP Strip_Tags Method. It will strip out all html and xml tags
    ///     except for the ones explicitly allowed in the second parameter.  If allowed, this method DOES NOT strip out
    ///     attributes.
    /// </summary>
    /// <param name="htmlString">
    ///     The HTML string.
    /// </param>
    /// <param name="listOfAllowedTags">
    ///     The list of allowed tags.  if null, then nothing allowed.  otherwise, ex: "b,<br/>,<hr>,p,i,<u>"
    /// </param>
    /// <returns>
    ///     Cleaned String
    /// </returns>
    /// <author>James R.</author>
    /// <createdate>10-27-2011</createdate>
    public static string StripTag(string htmlString, string listOfAllowedTags)
    {
        if (string.IsNullOrEmpty(htmlString))
        {
            return htmlString;
        }

        // this is the reg pattern that will retrieve all tags
        string patternThatGetsAllTags = "</?[^><]+>";

        // Create the Regex for all of the Allowed Tags
        string patternForTagsThatAreAllowed = string.Empty;
        if (!string.IsNullOrEmpty(listOfAllowedTags))
        {
            // get the HTML starting tag, such as p,i,b from an example string of <p>,<i>,<b>
            Regex remove = new Regex("[<>\\/ ]+");

            // now strip out /\<> and spaces
            listOfAllowedTags = remove.Replace(listOfAllowedTags, string.Empty);

            // split at the commas
            string[] listOfAllowedTagsArray = listOfAllowedTags.Split(',');

            foreach (string allowedTag in listOfAllowedTagsArray)
            {
                if (string.IsNullOrEmpty(allowedTag))
                {
                    // jump to next element of array.
                    continue;
                }

                string patternVersion1 = "<" + allowedTag + ">"; // <p>
                string patternVersion2 = "<" + allowedTag + " [^><]*>$";

                // <img src=stuff  or <hr style="width:50%;" />
                string patternVersion3 = "</" + allowedTag + ">"; // closing tag

                // if it is not the first time, then add the pipe | to the end of the string
                if (!string.IsNullOrEmpty(patternForTagsThatAreAllowed))
                {
                    patternForTagsThatAreAllowed += "|";
                }

                patternForTagsThatAreAllowed += patternVersion1 + "|" + patternVersion2 + "|" + patternVersion3;
            }
        }

        // Get all html tags included in the string
        Regex regexHtmlTag = new Regex(patternThatGetsAllTags);

        if (!string.IsNullOrEmpty(patternForTagsThatAreAllowed))
        {
            MatchCollection allTagsThatMatched = regexHtmlTag.Matches(htmlString);

            foreach (Match theTag in allTagsThatMatched)
            {
                Regex regOfAllowedTag = new Regex(patternForTagsThatAreAllowed);
                Match matchOfTag = regOfAllowedTag.Match(theTag.Value);

                if (!matchOfTag.Success)
                {
                    // if not allowed replace it with nothing
                    htmlString = htmlString.Replace(theTag.Value, string.Empty);
                }
            }
        }
        else
        {
            // else strip out all tags
            htmlString = regexHtmlTag.Replace(htmlString, string.Empty);
        }

        return htmlString;
    }

score 0 · Accepted Answer

我附上完整的代码：

条带化标签。

public static string StripTags(string source)
{
  return Regex.Replace(source, "<.*?>", string.Empty);
}

解码实体。

public static string DecodeHtmlEntities(string text)
{
    return HttpUtility.HtmlDecode(text);
}

c# - c#剥离html标签，解码实体

4 回答 4

Related

Reference