c# - Best way to scrape source code from a webpage?

Question

I'm working on a C# app. What is the best way to scrape source code from a webpage?

Right now, I am just viewing the page source in my browser (Chrome), copying & pasting it into a text file, and sucking it into a parser.

I was thinking I'd first create a textbox in my application where I'd be able to paste a URL. The application would then pull that page's source code and then pass it into my parser.

score 2 · Accepted Answer

我会考虑 HtmlAgilityPack。您可以轻松下载这样的页面：

HtmlDocument document = new HtmlDocument();
document.LoadHtml(new WebClient().DownloadString("http://www.bing.com"));

如果您也在寻找一个好的解析器，我对 ScrapySharp 有很好的经验，它向 HtmlAgilityPack 的 HtmlDocument 添加了扩展方法，以便使用 CssSelectors 轻松选择页面上的元素，就像您在 jQuery 中找到的那样，如下所示：

document.DocumentNode.CssSelect(".sessions .main-head-row td.download a.text-pdf")

c# - Best way to scrape source code from a webpage?

1 回答 1

Related

Reference