4

这是我到目前为止得到的(这不起作用)。在这一点上,我认为我的目标是 Ansi 编码的,但我真的不想在这一点上知道。我的浏览器似乎能够确定使用什么编码,我该怎么做?

static void GetUrl(Uri uri, string localFileName)
{
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
    HttpWebResponse response;

    response = (HttpWebResponse)request.GetResponse();

    // Save the stream to file
    Stream responseStream = response.GetResponseStream();
    StreamReader reader = new StreamReader(responseStream, Encoding.Default);
    Stream fileStream = File.OpenWrite(localFileName);
    using (StreamWriter sw = new StreamWriter(fileStream, Encoding.Default))
    {
        sw.Write(reader.ReadToEnd());
        sw.Flush();
        sw.Close();
     }
}

回答后(目前仅在 UTF-8 站点上测试):

static void GetUrl(Uri uri, string localFileName)
{
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    try
    {
        // Hope GetEncoding() knows how to parse the CharacterSet
        Encoding encoding = Encoding.GetEncoding(response.CharacterSet);
        StreamReader reader = new StreamReader(response.GetResponseStream(), encoding);
        using (StreamWriter sw = new StreamWriter(localFileName, false, encoding))
        {
            sw.Write(reader.ReadToEnd());
            sw.Flush();
            sw.Close();
        }
    }
    finally
    {
        response.Close();
    }
}
4

2 回答 2

3

网络浏览器可以通过三种方式尝试检测字符编码。

查找(如果是 HTML):

<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">

或(对于 XHTML)

<?xml version="1.0" encoding="ISO-8859-1"?>

有时甚至在 http 标头中指定

Content-Type: text/html; charset=ISO-8859-1
于 2008-11-16T10:30:07.930 回答
2

您应该寻找服务器发送响应的编码。Encoding.Default这里没有减少芥末。:-)

Stream responseStream = response.GetResponseStream();
Encoding enc = Encoding.GetEncoding(response.CharacterSet);
StreamReader reader = new StreamReader(responseStream, enc);
Stream fileStream = File.OpenWrite(localFileName);
using (StreamWriter sw = new StreamWriter(fileStream, enc))
{  /* ... */ }

可以肯定的是,您可以将所有内容都转换为 UTF-8 并将您的文件始终存储为 UTF-8。这样,您在读取文件时就无需猜测编码。

于 2008-11-16T10:53:39.037 回答