0

I'm working on a C# app. What is the best way to scrape source code from a webpage?

Right now, I am just viewing the page source in my browser (Chrome), copying & pasting it into a text file, and sucking it into a parser.

I was thinking I'd first create a textbox in my application where I'd be able to paste a URL. The application would then pull that page's source code and then pass it into my parser.

4

1 回答 1

2

我会考虑 HtmlAgilityPack。您可以轻松下载这样的页面:

HtmlDocument document = new HtmlDocument();
document.LoadHtml(new WebClient().DownloadString("http://www.bing.com"));

如果您也在寻找一个好的解析器,我对 ScrapySharp 有很好的经验,它向 HtmlAgilityPack 的 HtmlDocument 添加了扩展方法,以便使用 CssSelectors 轻松选择页面上的元素,就像您在 jQuery 中找到的那样,如下所示:

document.DocumentNode.CssSelect(".sessions .main-head-row td.download a.text-pdf")
于 2013-08-23T16:32:59.730 回答