c# - 如何获取/抓取 HTML 文本和图像到 Windows 手机？

Question

您好，我想知道，如何在 Windows phone 的列表 (ul, li) 中抓取 HTML 站点的文本。我想做一个RSS提要阅读器。请详细说明，我是 HTMLAgilityPack 的新手。谢谢。

score 0 · Accepted Answer

这并不像你想象的那么简单。您将不得不使用 HTMLAgility 包来解析和规范化 HTML 内容。但是你需要遍历每个节点来评估它是否是内容节点，即你会想要忽略 DIV、嵌入等。

我会尽力帮助您入门。

阅读文件

Uri url = new Uri(<Your url>);
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument document = web.Load(url.AbsoluteUri);

这是提取图像和文本标签的方法

var docNode = documentNode;
// if you just want all text withing the document then life is simpler.
string htmlText = docNode.InnerText;

// Get images
IEnumerable<HtmlNode> imageNodes = docNode.Descendants("img");
// Now iterate through all the images and do what you like...

如果您想实现 Readability/Instapaper 之类的清理，请从https://github.com/marek-stoj/NReadability下载 NReadability

c# - 如何获取/抓取 HTML 文本和图像到 Windows 手机？

1 回答 1

Related

Reference