4

我在解析来自服务器的 JSON 响应时遇到了一个奇怪的问题。以这种方式获取响应(使用 Content-Type:text/html)时,过去几个月一直运行良好:

string response = "";
using (var client = new System.Net.Http.HttpClient())
{
    var postData = new System.Net.Http.FormUrlEncodedContent(data);
    var clientResult = await client.PostAsync(url, postData);
    if(clientResult.IsSuccessStatusCode)
    {
        response = await clientResult.Content.ReadAsStringAsync();
    }
}
//Parse the response to a JObject...

但是当接收到Content-Type: text/html;的响应时 charset=utf8它会抛出Content-Type is invalid的异常。

Exception message: The character set provided in ContentType is invalid. Cannot read content as string using an invalid character set.

所以我改变了这个:

response = await clientResult.Content.ReadAsStringAsync();

对此:

var raw_response = await clientResult.Content.ReadAsByteArrayAsync();
response = Encoding.UTF8.GetString(raw_response, 0, raw_response.Length);

现在我可以毫无例外地得到响应,但是在解析它时,它会抛出一个解析异常。在调试时我得到了这个:(我将响应更改为更短的响应以进行测试)

var r1 = await clientResult.Content.ReadAsStringAsync();
var r2 = Encoding.UTF8.GetString(await clientResult.Content.ReadAsByteArrayAsync(), 0, raw_response.Length);
System.Diagnostics.Debug.WriteLine("Length: {0} - {1}", r1.Length, r1);
System.Diagnostics.Debug.WriteLine("Length: {0} - {1}", r2.Length, r2);

//Output
Length: 38 - {"version":1,"specialword":"C\u00e3o"}
Length: 39 - {"version":1,"specialword":"C\u00e3o"}

JSON 响应格式在这两种情况下似乎都是正确的,但长度不同,我不知道为什么。将其复制到记事本++以发现隐藏字符时不知从何而来。

Length: 38 - {"version":1,"specialword":"C\u00e3o"}
Length: 39 - ?{"version":1,"specialword":"C\u00e3o"}

显然是抛出解析异常,但我不知道为什么Encoding.UTF8.GetString会导致这种情况。

在过去的几个小时里,我一直在与这个作斗争,我真的需要一些帮助。

4

1 回答 1

8

Well, I'm surprised that you're getting that behavior, I would have expected Encoding.UTF8.GetString to have handled that for you.

What you're seeing, the character value 0xFEFF, is a byte order mark ("BOM"). A BOM is unnecessary in UTF-8 because the byte order is not variable, but it is allowed, as a marker that the following text is encoded UTF-8. (The actual byte sequence is EF BB BF, but then when that's decoded in UTF-8, it becomes code point FEFF.)

If you create your own UTF8Encoding instance, you can tell it whether to include or exclude the BOM. (I think I'm mistaken about that, it may only control whether it includes one when encoding .)

Alternately, you could explicitly test for that and remove the BOM if present, e.g.:

var r2 = Encoding.UTF8.GetString(await clientResult.Content.ReadAsByteArrayAsync(), 0, raw_response.Length);
if (r2[0] == '\uFEFF') {
    r2 = r2.Substring(1);
}
于 2013-08-04T13:17:45.360 回答