1

我想使用HTML Agility 实用程序从 URL 中提取标题、描述和图像, 到目前为止,我无法找到一个易于理解的示例并且可以帮助我做到这一点。

如果有人可以帮助我举例,我将不胜感激,以便我可以提取标题、描述并让用户选择从一系列图像中选择图像(当我们共享链接时类似于 Facebook 的东西)。

更新:

我在 .aspx 页面上放置了标题、描述和按钮、文本框的标签,并且我在按钮单击事件上触发了以下代码。但它为所有值返回 null 。可能是我做错了什么。

我使用了以下示例 URLhttp://edition.cnn.com/2012/10/31/world/asia/india/index.html?hpt=hp_t2

protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
    HtmlDocument doc = new HtmlDocument();
    var response = txtURL.Text;
    doc.LoadHtml(response);

    String title = (from x in doc.DocumentNode.Descendants()
                    where x.Name.ToLower() == "title"
                    select x.InnerText).FirstOrDefault();

    String desc = (from x in doc.DocumentNode.Descendants()
                   where x.Name.ToLower() == "description"
                   select x.InnerText).FirstOrDefault();

    List<String> imgs = (from x in doc.DocumentNode.Descendants()
                         where x.Name.ToLower() == "img"
                         select x.Attributes["src"].Value).ToList<String>();

    lblTitle.Text = title;
    lblDescription.Text = desc;
}

上面的代码让我得到所有变量的空值

如果我用这个修改代码

HtmlDocument doc = new HtmlDocument();
        var url = txtURL.Text;

        var webGet = new HtmlWeb();
         doc = webGet.Load(url);

在这种情况下,它只会让我的标题和描述值再次为空

4

1 回答 1

3
protected void btnGetURLDetails_Click(object sender, EventArgs e)
{
    HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(new Uri(txtURL.Text));
    request.Method = WebRequestMethods.Http.Get;

    HttpWebResponse response = (HttpWebResponse)request.GetResponse();

    StreamReader reader = new StreamReader(response.GetResponseStream());

    String responseString = reader.ReadToEnd();

    response.Close();

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(responseString);

    String title = (from x in doc.DocumentNode.Descendants()
                where x.Name.ToLower() == "title"
                select x.InnerText).FirstOrDefault();

    String desc = (from x in doc.DocumentNode.Descendants()
               where x.Name.ToLower() == "meta"
               && x.Attributes["name"] != null
               && x.Attributes["name"].Value.ToLower() == "description"
               select x.Attributes["content"].Value).FirstOrDefault();

    List<String> imgs = (from x in doc.DocumentNode.Descendants()
                     where x.Name.ToLower() == "img"
                     select x.Attributes["src"].Value).ToList<String>();

   lblTitle.Text = title;
   lblDescription.Text = desc;

}

于 2012-10-31T13:13:44.253 回答