c# - GetSafeHtmlFragment 删除所有 html 标签

Question

我在我的网站中使用 GetSafeHtmlFragment，我发现除了<p>和之外的所有标签都<a>被删除了。

我研究了一下，发现微软没有解决它。

是否有任何替代或有任何解决方案？

谢谢。

score 8 · Accepted Answer

令人惊讶的是，微软在 4.2.1 版本中严重过度补偿了 4.2 XSS 库中的安全漏洞，而现在一年后仍未更新。当我读到有人在某处发表评论时，该GetSafeHtmlFragment方法应该已重命名。StripHtml

我最终使用了这个相关的 SO 问题中建议的HtmlSanitizer 库。我喜欢它可以通过 NuGet 以包的形式提供。

该库基本上实现了现在接受的答案使用的白名单方法的变体。然而，它基于CsQuery而不是 HTML Agility 库。该包还提供了一些附加选项，例如能够保留样式信息（例如 HTML 属性）。使用这个库导致我的项目中的代码如下所示，这 - 至少 - 比接受的答案要少得多:)。

using Html;

...

var sanitizer = new HtmlSanitizer();
sanitizer.AllowedTags = new List<string> { "p", "ul", "li", "ol", "br" };
string sanitizedHtml  = sanitizer.Sanitize(htmlString);

score 2 · Accepted Answer

另一种解决方案是将Html Agility Pack与您自己的标签白名单结合使用：

using System;
using System.IO;
using System.Text;
using System.Linq;
using System.Collections.Generic;
using HtmlAgilityPack;

class Program
{
    static void Main(string[] args)
    {
        var whiteList = new[] 
            { 
                "#comment", "html", "head", 
                "title", "body", "img", "p",
                "a"
            };
        var html = File.ReadAllText("input.html");
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        var nodesToRemove = new List<HtmlAgilityPack.HtmlNode>();
        var e = doc
            .CreateNavigator()
            .SelectDescendants(System.Xml.XPath.XPathNodeType.All, false)
            .GetEnumerator();
        while (e.MoveNext())
        {
            var node =
                ((HtmlAgilityPack.HtmlNodeNavigator)e.Current)
                .CurrentNode;
            if (!whiteList.Contains(node.Name))
            {
                nodesToRemove.Add(node);
            }
        }
        nodesToRemove.ForEach(node => node.Remove());
        var sb = new StringBuilder();
        using (var w = new StringWriter(sb))
        {
            doc.Save(w);
        }
        Console.WriteLine(sb.ToString());
    }
}

c# - GetSafeHtmlFragment 删除所有 html 标签

2 回答 2

Related

Reference