.net - .NET WebRequest/WebResponse 能否正确翻译重音符号、变音符号和实体？

Question

我使用 .NET 的 WebRequest 将自己的页面作为临时黑客“屏幕抓取”。

这很好用，但重音字符和变音字符不能正确翻译。

我想知道是否有一种方法可以使用.NET 的许多内置属性和方法使它们正确翻译。

这是我用来抓取页面的代码：

private string getArticle(string urlToGet)
{

    StreamReader oSR = null;

    //Here's the work horse of what we're doing, the WebRequest object 
    //fetches the URL
    WebRequest objRequest = WebRequest.Create(urlToGet);

    //The WebResponse object gets the Request's response (the HTML) 
    WebResponse objResponse = objRequest.GetResponse();

    //Now dump the contents of our HTML in the Response object to a 
    //Stream reader
    oSR = new StreamReader(objResponse.GetResponseStream());


    //And dump the StreamReader into a string...
    string strContent = oSR.ReadToEnd();

    //Here we set up our Regular expression to snatch what's between the 
    //BEGIN and END
    Regex regex = new Regex("<!-- content_starts_here //-->((.|\n)*?)<!-- content_ends_here //-->",
        RegexOptions.IgnoreCase);

    //Here we apply our regular expression to our string using the 
    //Match object. 
    Match oM = regex.Match(strContent);

    //Bam! We return the value from our Match, and we're in business. 
    return oM.Value;
}

score 2 · Accepted Answer

尝试使用：

System.Net.WebClient client = new System.Net.WebClient();
字符串 html = client.DownloadString(urlToGet);
字符串解码 = System.Web.HttpUtility.HtmlDecode(html);

另外，请查看 client.Encoding

score 0 · Accepted Answer

还有另一种处理方法，使用 StreamReader 构造函数的第二个参数，如下所示：

new StreamReader(webRequest.GetResponse().GetResponseStream(), 
                 Encoding.GetEncoding("ISO-8859-1"));

那会成功的。

.net - .NET WebRequest/WebResponse 能否正确翻译重音符号、变音符号和实体？

2 回答 2

Related

Reference