为了回答您的问题,以下是 .NET 在使用 a 时映射到 U+0022(您称之为“普通双引号”符号)的 Unicode 代码点列表StreamWriter
:
- U+0022
- U+02BA
- U+030E
- U+201C
- U+201D
- U+201E
- U+FF02
使用这个答案,我快速写了一些东西,它创建了 UTF-8 到 ISO-8859-15 (Latin-9) 的反向映射。
Encoding utf8 = Encoding.UTF8;
Encoding latin9 = Encoding.GetEncoding("ISO-8859-15");
Encoding iso = Encoding.GetEncoding(1252);
var map = new Dictionary<string, List<string>>();
// same code to get each line from the file as per the linked answer
while (true)
{
string line = reader.ReadLine();
if (line == null) break;
string codePointHexAsString = line.Substring(0, line.IndexOf(";"));
int codePoint = Convert.ToInt32(codePointHexAsString, 16);
// skip Unicode surrogate area
if (codePoint >= 0xD800 && codePoint <= 0xDFFF)
continue;
string utf16String = char.ConvertFromUtf32(codePoint);
byte[] utf8Bytes = utf8.GetBytes(utf16String);
byte[] latin9Bytes = Encoding.Convert(utf8, latin9, utf8Bytes);
string latin9String = latin9.GetString(latin9Bytes);
byte[] isoBytes = Encoding.Convert(utf8, iso, utf8Bytes);
string isoString = iso.GetString(isoBytes); // this is not always the same as latin9String!
string latin9HexAsString = latin9[0].ToString("X");
if (!map.ContainsKey(latin9HexAsString))
{
isoMap[latin9HexAsString] = new List<string>();
}
isoMap[latin9HexAsString].Add(codePointHexAsString);
}
有趣的是,ISO-8859-15 似乎替换了比 ISO-8859-1 更多的字符,这是我没想到的。