python - 解码编码为GB2312的json

Question

通过 GET 请求，我从 Google 地理编码 API 中提取 json：

import urllib, urllib2

url = "http://maps.googleapis.com/maps/api/geocode/json"
params = {'address': 'ivory coast', 'sensor': 'false'}
request = urllib2.Request(url + "?" + urllib.urlencode(params))
response = urllib2.urlopen(request)
st = response.read()

结果看起来像：

{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "CÃ´te d'Ivoire",
               "short_name" : "CI",
               "types" : [ "country", "political" ]
            }
         ],
         "formatted_address" : "CÃ´te d'Ivoire",
         "geometry" : { ... # rest snipped

如您所见，国家名称存在一些编码问题。我试图猜测这样的编码：

import chardet
encoding = chardet.detect(st)
print "String is encoded in {0} (with {1}% confidence).".format(encoding['encoding'], encoding['confidence']*100)

返回：

String is encoded in GB2312 (with 99.0% confidence).

我想知道的是如何将其转换为具有ô正确显示（带抑扬符的o）编码的字典。

我试过：

st = st.decode(encoding['encoding']).encode('utf-8')

但后来我得到：

{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "Cä¹ˆte d'Ivoire",
               "short_name" : "CI",
               "types" : [ "country", "political" ]
            }
         ],
         "formatted_address" : "Cä¹ˆte d'Ivoire",
         "geometry" : { ... # rest snipped

score 3 · Accepted Answer

google api 结果总是以编码UTF-8，你甚至可以从他们的 HTTP Content-Type 标头中手动读取：

在此处输入图像描述

score 2 · Accepted Answer

一旦你（正确地）解码它，不要重新编码它；json可以unicode很好地工作。

>>> json.loads(u"[\"C\xf4te d'Ivoire\"]")
[u"C\xf4te d'Ivoire"]

python - 解码编码为GB2312的json

2 回答 2

Related

Reference