1

我正在尝试下载网页:

string remoteUri = "http://whois.domaintools.com/94.100.179.159";
WebClient myWebClient = new WebClient();
byte[] myDataBuffer = myWebClient.DownloadData(remoteUri);
string download = Encoding.ASCII.GetString(myDataBuffer);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(download);
doc.Save("file1.htm");

有错误

webexception 未处理:(403) 禁止。

还有其他方法可以下载页面吗?我试过 HtmlDocument 类,但我可以看到它需要在浏览器中加载网页。

HtmlWeb hwObject = new HtmlWeb();
        string ip = "http://whois.domaintools.com/";
        HtmlAgilityPack.HtmlDocument htmldocObject = hwObject.Load(ip);

        foreach (HtmlNode link in htmldocObject.DocumentNode.SelectNodes("//meta[@name = 'description']"))
        {
            ...
        }
4

2 回答 2

3
using (var myWebClient = new WebClient())
{
    myWebClient.Headers["User-Agent"] = "MOZILLA/5.0 (WINDOWS NT 6.1; WOW64) APPLEWEBKIT/537.1 (KHTML, LIKE GECKO) CHROME/21.0.1180.75 SAFARI/537.1";

    string page = myWebClient.DownloadString("http://whois.domaintools.com/94.100.179.159");

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(page);
}
于 2012-08-25T10:55:24.107 回答
2

当站点在请求中找不到任何用户代理时,站点只会返回一个错误,这是工作代码。

string remoteUri = "http://whois.domaintools.com/94.100.179.159";
HtmlDocument doc = new HtmlDocument();
using (WebClient myWebClient = new WebClient())
{
  myWebClient.Headers.Add(HttpRequestHeader.UserAgent, "some browser user agent");
  doc.Load(myWebClient.OpenRead(remoteUri));
}
doc.Save("file1.htm");

或者如果你想使用HtmlWeb

HtmlWeb hwObject = new HtmlWeb();
hwObject.UserAgent = "some browser user agent";
//more code...
于 2012-08-25T10:54:59.873 回答