0

我有这个代码:

private List<string> webCrawler(string url, int levels)
        {
            HtmlAgilityPack.HtmlDocument doc;
            HtmlWeb hw = new HtmlWeb(); 
            List<string> webSites;
            List<string> csFiles = new List<string>();

            csFiles.Add("temp string to know that something is happening in level = " + levels.ToString());
            csFiles.Add("current site name in this level is : "+url);

            doc = hw.Load(url);
            webSites = getLinks(doc);


            if (levels == 0)
            {
                return csFiles;
            }
            else
            {
                int actual_sites = 0;
                for (int i = 0; i < webSites.Count() && i< 20; i++)                 {
                    string t = webSites[i];
                                        if ( (t.StartsWith("http://")==true) || (t.StartsWith("https://")==true) )                     {
                        actual_sites++;
                        csFiles.AddRange(webCrawler(t, levels - 1));
                        Texts(richTextBox1, "Level Number " + levels + " " + t + Environment.NewLine, Color.Red);
                    }
                }

                return csFiles;
            }


        }

getLinks() 是:

private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
        {

            List<string> mainLinks = new List<string>();
            var linkNodes = document.DocumentNode.SelectNodes("//a[@href]");
            if (linkNodes != null)
            {
                foreach (HtmlNode link in linkNodes)
                {
                    var href = link.Attributes["href"].Value;
                    mainLinks.Add(href);
                }
            }
            return mainLinks;

        }

问题是,例如,我爬进了 google.com,所以几次之后它就进入了该网站:

http://picasa.google.co.il/intl/iw/#utm_source=iw-all-more&utm_campaign=iw-pic&utm_medium=et

然后我得到了异常:

doc = hw.Load(url);

错误是:无法解析远程名称:'picasa.google.co.il'

例外是:

System.Net.WebException was unhandled
  Message=The remote name could not be resolved: 'picasa.google.co.il'
  Source=System
  StackTrace:
       at System.Net.HttpWebRequest.GetResponse()
       at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1446
       at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563
       at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1152
       at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107
       at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 79
       at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 108
       at GatherLinks.Form1.webCrawler(String url, Int32 levels) in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 108
       at GatherLinks.Form1..ctor() in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Form1.cs:line 31
       at GatherLinks.Program.Main() in D:\C-Sharp\GatherLinks\GatherLinks\GatherLinks\Program.cs:line 18
       at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)
       at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: 

我该如何修复/修复/解决它?

谢谢你。

4

1 回答 1

3

例外是告诉您它无法解析picasa.google.co.il为 IP 地址。您可能只需要验证名称是否正确。

打开命令窗口并输入:

ping picasa.google.co.il

您会发现您的计算机无法与此服务器通信,因为它没有 DNS 条目。

于 2012-09-11T15:27:35.943 回答