3

我有从越南网站获取响应的简单代码:http: //vnexpress.net,但有一个小问题。第一次下载没问题,但下载后内容中包含未知符号如下:�\b\0\0\0\0\0\0�\a`I�%&/m....问题是什么?

    string address = "http://vnexpress.net";
    WebClient webClient = new WebClient();
    webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
    webClient.Encoding = System.Text.Encoding.UTF8;
    return webClient.DownloadString(address);
4

3 回答 3

9

您会发现响应是 GZipped。似乎没有办法下载它WebClient,除非您创建派生类并修改底层HttpWebRequest以允许自动解压缩。

下面是你如何做到这一点:

    public class MyWebClient : WebClient
    {
        protected override WebRequest GetWebRequest(Uri address)
        {
            var req = base.GetWebRequest(address) as HttpWebRequest;
            req.AutomaticDecompression = DecompressionMethods.GZip;
            return req;
        }
    }

并使用它:

string address = "http://vnexpress.net";
MyWebClient webClient = new MyWebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");
webClient.Encoding = System.Text.Encoding.UTF8;
return webClient.DownloadString(address);
于 2013-02-23T00:09:10.277 回答
1

尝试使用代码,你会没事的:

string address = "http://vnexpress.net";

WebClient webClient = new WebClient();

webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 6.2; WOW64)   AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 AlexaToolbar/alxg-3.1");

return Encoding.UTF8.GetString(Encoding.Default.GetBytes(webClient.DownloadString(address)));             
于 2013-02-22T23:37:17.100 回答
0

DownloadString 要求服务器在 Content-Type 响应标头中正确指示字符集。如果您在 Fiddler 中观看,您会看到服务器在 HTML 响应正文中的 META 标记内发送字符集:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />   

如果您需要处理这样的响应,您需要自己解析 HTML 或使用 FiddlerCore 之类的库来为您执行此操作。

于 2013-02-22T23:50:54.137 回答