c# - 当 HtmlAgilityPack.Document 尝试加载以 exe 结尾的链接时，我该怎么办？

Question

这是功能：

private static HtmlAgilityPack.HtmlDocument getHtmlDocumentWebClient(string url, bool useProxy, string proxyIp, int proxyPort, string usename, string password)
        {
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            WebClient client = new WebClient();
            //client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
            client.Credentials = CredentialCache.DefaultCredentials;
            client.Proxy = WebRequest.DefaultWebProxy;
            if (useProxy)
            {
                //Proxy                
                if (!string.IsNullOrEmpty(proxyIp))
                {
                    WebProxy p = new WebProxy(proxyIp, proxyPort);
                    if (!string.IsNullOrEmpty(usename))
                    {
                        if (password == null)
                            password = string.Empty;
                        NetworkCredential nc = new NetworkCredential(usename, password);
                        p.Credentials = nc;
                    }
                }
            }
            Stream data = client.OpenRead(url);
            doc.Load(data);
            data.Close();
            return doc;
        }

我在我的程序中获取每个迭代的链接，几次后变量 url 是：

http://appldnld.apple.com/iTunes10/041-7196.20120912.Ber43/iTunesSetup.exe

如果我在 InternetExplorer 中尝试此链接，它将尝试下载文件。但是在我的程序中，它试图将其加载到行中：

doc.Load（数据）；

一段时间后，程序会冻结卡住，最后当我强制在任务管理器中结束应用程序时，程序会抛出异常：

StackOverFlowException was unhandled 

An unhandled exception of type 'System.StackOverflowException' occurred in HtmlAgilityPack.dll

System.StackOverflowException was unhandled
Message: An unhandled exception of type 'System.StackOverflowException' occurred in HtmlAgilityPack.dll

现在我使用了一个断点，问题发生在一行：

doc.Load(data);

问题是我应该如何处理这个链接？我应该通过 try and catch 忽略它们还是应该将其视为链接？如果将来某个时候我想使用此链接下载 exe 文件，那么尝试 ctach 不是一个好主意怎么办？

编辑：

这就是 getHtmlDocumentWebClient 现在的样子：

private  static HtmlAgilityPack.HtmlDocument getHtmlDocumentWebClient(string url, bool useProxy, string proxyIp, int proxyPort, string usename, string password)
        {

            HttpWebRequest myHttpWebRequest = null;     //Declare an HTTP-specific implementation of the WebRequest class.
            HttpWebResponse myHttpWebResponse = null;   //Declare an HTTP-specific implementation of the WebResponse class
            //Create Request
            myHttpWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
            myHttpWebRequest.Method = "GET";
            myHttpWebRequest.ContentType = "text/html; encoding='utf-8'";
            //Get Response
            myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();

            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            Stream data = myHttpWebResponse.GetResponseStream();//client.OpenRead(url);
            doc.Load(data);
            data.Close();
            return doc;
        }

同样的问题。现在该功能有什么问题，我如何对文本/html内容进行实际检查？

score 1 · Accepted Answer

Content-Type在尝试将响应解析为 HTML 之前，您应该检查一下。
如果它不是text/html或其变体之一，请不要解析它。

要获取 Content-Type，您需要使用HttpWebRequest而不是WebClient.
然后您可以检查response.Headers.

c# - 当 HtmlAgilityPack.Document 尝试加载以 exe 结尾的链接时，我该怎么办？

1 回答 1

Related

Reference