c# - htmlagilitypack not loading full content of the page

Question

I need to screen scrap a website with the given urls. When I try to load the content of the page http://cks.nice.org.uk/?char=B , I get all the content (in doc object below) except links (anchor elements) inside div with class="list-wrapper"

Any ideas? thanks

using System;
using HtmlAgilityPack;

public partial class _Default : System.Web.UI.Page
{

protected void Page_Load(object sender, EventArgs e)
{
    HtmlWeb web = new HtmlWeb();
    HtmlDocument doc = null;
    doc = web.Load("http://cks.nice.org.uk/?char=B");
}

}

score 0 · Accepted Answer

我一般不熟悉 HtmlAgilityPack 或 C#，但我可以从抓取的角度告诉你我会做什么。

您需要获取的文档是http://cks.nice.org.uk/js/topics.txt，它提供了主题名称及其 URL 的良好 JSON 结构。解析它，你会看到一个对象数组，例如：

{"Title":"Achilles tendinopathy","Slug":"achilles-tendinopathy","Specialities":["Injuries","Musculoskeletal"]},
{"Title":"Acne vulgaris","Slug":"acne-vulgaris","Specialities":["Skin and nail"]}

从每个获取“Slug”并附加到基本 URL 以获取每个主题页面，例如http://cks.nice.org.uk/achilles-tendinopathy

c# - htmlagilitypack not loading full content of the page

1 回答 1

Related

Reference