0

我想从 HTML 代码中获取一个值,我正在使用这个 C# 代码来使用HtmlAgilityPack从这个 HTML 代码中获取值。

我只想要地址和电话号码

<div class="company-info">
    <div id="o-company" class="edit-overlay-section" style="padding-top:5px; width: 400px;">
        <a href="http://www.manta.com/c/mm23df2/us-cellular" class="company-name">
            <h1 class="profile-company_name" itemprop="name">US Cellular</h1>
        </a>
    </div>      
    <div class="addr addr-co-header-gamma" itemprop="address"itemscope=""itemtype="http://schema.org/PostalAddress">
        <em>United States Cellular Corporation</em>
        <div class="company-address">   
        <div itemprop="streetAddress">2401 12th Avenue NW # 104B</div>
            <span class="addressLocality" itemprop="addressLocality">Ardmore</span>,
            <span class="addressRegion" itemprop="addressRegion">OK</span>      
            <span class="addresspostalCode" itemprop="postalCode">73401-1471</span>
        </div>
        <dl class="phone_info"><dt>Phone:</dt>
        <dd class="tel" itemprop="telephone">(580) 490-3333</dd>
...

C#代码:

private HtmlDocument ParseLink(string URL)
{ 
    HtmlDocument hDoc = new HtmlDocument();
    try
    {
        WebClient wClient = new WebClient();

        byte[] bData = wClient.DownloadData(pageurl);

        hDoc.LoadHtml(ASCIIEncoding.ASCII.GetString(bData));
        Response.Write("<table><tr><td>");

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//div[@itemprop='company-address']"))
        {
            Response.Write(hNode.InnerText.ToString());
        }
        Response.Write("</tr></td><td>");

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//span[@itemprop='addressLocality']"))
        {

            Response.Write(hNode.InnerText.ToString());
        }
        Response.Write("</tr></td><td>");   

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//span[@itemprop='addressRegion']"))
        {
            Response.Write(hNode.InnerText.ToString());
        }

        Response.Write("</tr></td><td>");

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//span[@itemprop='postalCode']"))
        {
            Response.Write(hNode.InnerText.ToString());
        }

        Response.Write("</tr></td><td>"); 

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//dd[@itemprop='telephone']"))
        {
            Response.Write(hNode.InnerText.ToString());
        }
        Response.Write("</td>");
        Response.Write("</tr></table>");

    }
    catch (Exception ex)
    {
        Response.Write(ex.Message);
        hDoc.LoadHtml("");
    }

    return hDoc;
}

但是,当编译此代码时,我会收到此错误:

"Object reference not set to an instance of an object"

有谁能帮助我吗?谢谢你。

4

1 回答 1

0

您需要提供有关您收到的异常的更多信息(例如抛出异常的哪一行),但是......

如果没有找到与XPath表达式匹配的项目,则该SelectNodes方法将返回,这意味着您必须在迭代节点之前检查返回的值是否存在。就像是:nullnull

var companyAddressNodes = hDoc.DocumentNode.SelectNodes("//div[@itemprop='company-address']");

if (companyAddressNodes == null) {
    //Throw properly exception here, log the error, or do anything you want...
    throw new Exception("No company address node found. Perhaps the page layout changed?");
}

foreach (HtmlNode hNode in )
{
    Response.Write(hNode.InnerText.ToString());
}
于 2012-09-18T17:53:39.370 回答