c# - Utf7Encoding 文本截断

Question

我遇到了 Utf7Encoding 类截断“+4”序列的问题。我很想知道为什么会这样。我尝试使用 Utf8Encoding 从 byte[] 数组中获取字符串，它似乎可以正常工作。Utf8 是否存在类似的已知问题？本质上，我使用此转换产生的输出从 rtf 字符串构造 html。

这是片段：

    UTF7Encoding utf = new UTF7Encoding(); 
    UTF8Encoding utf8 = new UTF8Encoding(); 

    string test = "blah blah 9+4"; 

    char[] chars = test.ToCharArray(); 
    byte[] charBytes = new byte[chars.Length]; 

    for (int i = 0; i < chars.Length; i++) 
    { 

        charBytes[i] = (byte)chars[i]; 

     }


    string resultString = utf8.GetString(charBytes); 
    string resultStringWrong = utf.GetString(charBytes); 

    Console.WriteLine(resultString);  //blah blah 9+4  
    Console.WriteLine(resultStringWrong);  //blah 9

score 1 · Accepted Answer

您没有正确地将字符串转换为 utf7 字节。您应该调用utf.GetBytes()而不是将字符转换为字节。

我怀疑在 utf7 中，对应于 '+' 的 ascii 码实际上是为编码国际 unicode 字符而保留的。

score 1 · Accepted Answer

像这样通过 char 数组转换为字节数组是行不通的。如果您希望字符串特定于字符集，byte[]请执行以下操作：

UTF7Encoding utf = new UTF7Encoding();
UTF8Encoding utf8 = new UTF8Encoding();

string test = "blah blah 9+4";

byte[] utfBytes = utf.GetBytes(test);
byte[] utf8Bytes = utf8.GetBytes(test);

string utfString = utf.GetString(utfBytes);
string utf8String = utf8.GetString(utf8Bytes);

Console.WriteLine(utfString);  
Console.WriteLine(utf8String);

输出：

呜呜呜 9+4

呜呜呜 9+4

c# - Utf7Encoding 文本截断

2 回答 2

Related

Reference