c# - 使用 HTMLAGILITY 包提取具有特定属性的表行

Question

考虑这段代码：

<tr>
                                                <td valign=top class="tim_new"><a href="/stocks/company_info/pricechart.php?sc_did=MI42" class="tim_new">3M India</a></td>
                                                <td class="tim_new" valign=top><a href='/stocks/marketstats/indcomp.php?optex=NSE&indcode=Diversified' class=tim>Diversified</a></td>

我想使用 HTMLAgility 包编写一段代码，它将提取第一行中的链接。

    using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace WebScraper
{
    class Program
    {
        static void Main(string[] args)
        {
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml("http://theurl.com");
            try
            {
                var links = doc.DocumentNode.SelectNodes("//td[@class=\"tim_new\"]");

            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                Console.WriteLine(ex.StackTrace);
                Console.ReadKey();
            }

        }
    }
}

当我尝试在 try 块中插入foreach(var link in links)语句/循环时，会引发运行时错误。

score 1 · Accepted Answer

该代码doc.LoadHtml("http://theurl.com");将不起作用。LoadHtml 的参数应该是包含 HTML 的字符串，而不是 URL。您必须先获取 HTML 文档，然后再尝试解析它。

加载文档后，对于此特定示例，您可以使用以下命令：

IEnumerable<string> links = doc.DocumentNode
                               .SelectNodes("//a[@class='tim_new']")
                               .Select(n => n.Attributes["href"].Value);

c# - 使用 HTMLAGILITY 包提取具有特定属性的表行

1 回答 1

Related

Reference