c# - HTML 到 RichTextBox 作为带有超链接的纯文本

Question

阅读了很多关于不使用 RegExes 剥离 HTML的内容，我想知道如何在我的 RichTextBox 中获取一些链接，而不会获取我从某个报纸网站下载的内容中的所有杂乱 html。

我所拥有的：来自报纸网站的 HTML。

我想要什么：文章作为 RichTextBox 中的纯文本。但是使用链接（即用替换<a href="foo">bar</a>）<Hyperlink NavigateUri="foo">bar</Hyperlink>。

HtmlAgilityPack 给了我HtmlNode.InnerText（去掉了所有的 HTML 标签）和HtmlNode.InnerHtml（所有的标签）。我可以使用获取链接的 URL 和文本articlenode.SelectNodes(".//a")，但是我应该如何知道将其插入到纯文本中的位置HtmlNode.InnerText？

任何提示将不胜感激。

score 0 · Accepted Answer

以下是您可以执行的操作（使用示例控制台应用程序，但 Silverlight 的想法是相同的）：

假设你有这个 HTML：

<html>
<head></head>
<body>
Link 1: <a href="foo1">bar</a>
Link 2: <a href="foo2">bar2</a>
</body>
</html>

然后这段代码：

HtmlDocument doc = new HtmlDocument();
doc.Load(myFileHtm);

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a"))
{
    // replace the HREF element in the DOM at the exact same place
    // by a deep cloned one, with a different name
    HtmlNode newNode = node.ParentNode.ReplaceChild(node.CloneNode("Hyperlink", true), node);

    // modify some attributes
    newNode.SetAttributeValue("NavigateUri", newNode.GetAttributeValue("href", null));
    newNode.Attributes.Remove("href");
}
doc.Save(Console.Out);

将输出：

<html>
<head></head>
<body>
Link 1: <hyperlink navigateuri="foo1">bar</hyperlink>
Link 2: <hyperlink navigateuri="foo2">bar2</hyperlink>
</body>
</html>

c# - HTML 到 RichTextBox 作为带有超链接的纯文本

1 回答 1

Related

Reference