What libraries can you advise except HtmlAgilityPack and Tidy?

To be able to apply XPath queries to HTML content, I use either Tidy as console program with some tricks to get C# XmlDocument or Html Agility Pack. Both these libs are outdated - HAP wasn't changed since May-2010 and Tidy since 2008. I had bad experience using HAP because it did not fix document structure by closing tags even after applying next trick:

public static HtmlDocument MakeEmptyDocument()
    HtmlDocument doc = new HtmlDocument();
    doc.OptionAutoCloseOnEnd = true;
    doc.OptionFixNestedTags = true;
    doc.OptionOutputAsXml = true;
    doc.OptionWriteEmptyNodes = true;
    return doc;

public static HtmlDocument LoadHtmlDocumentFromString(string content)
    HtmlDocument doc = MakeEmptyDocument();
    StringBuilder sb = new StringBuilder();
    using (StringWriter sw = new StringWriter(sb))

    using (StringReader sw = new StringReader(sb.ToString()))
    return doc;

Generally I preferred Tidy but now I have a case when it breaks quite simple document completely and removes BIG content part from it. So it looks like we need alternatives that can be used from .NET .


1 回答 1


Tidy 项目已被 HTACG(HTML Tidy Advocacy Community Group)接管,现在已经发布了带有 libtidy 库的 tidy5(截至 2015 年底),这些库提供了一个“可以从大量编程语言中调用”的 C 接口。请参阅以下内容:

HTML Tidy 项目(开发者部分)

于 2016-01-09T20:52:10.307 回答