c# - 需要帮助了解 UTF 编码

Question

你好，我注意到当我使用 UTF-8 编码（无 BOM）保存文本文件时，我能够使用 C# 上的 UTF-16 编码完美地读取它。现在这让我有点困惑，因为 UTF-8 只使用 8 位，对吧？utf-16 每个字符占用 16 位。

现在想象一下，我在这个文件中将字符串“ab”写成 UTF-8，然后有一个字节用于字母“a”，另一个字节用于“b”。

好的，但是当使用 UTF-16 字符集时，如何读取这个 UTF-8 文件呢？在我看来，在读取文件时，“ab”的两个字节会被误认为只有一个包含两个字节的字符。因为 UTF-16 需要这 2 个字节。

这就是我的阅读方式（t.txt 编码为 UTF-8）：

using(StreamReader sr = new StreamReader(File.OpenRead("t.txt"), Encoding.GetEncoding("utf-16")))
{
    Console.Write(sr.ReadToEnd());
    Console.ReadKey();
}

score 5 · Accepted Answer

Check out http://www.joelonsoftware.com/articles/Unicode.html, it will answer all your unicode questions

score 1 · Accepted Answer

1

于 2011-06-11T04:24:19.883 回答

score 1 · Accepted Answer

take a look at the following article:

http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html

c# - 需要帮助了解 UTF 编码

3 回答 3

Related

Reference