0

嗯,我不喜欢 python 中的 utf-8;似乎无法弄清楚如何解决这个问题。如您所见,我已经在尝试对值进行 B64 编码,但看起来 python 正在尝试首先将其从 utf-8 转换为 ascii ...

一般来说,我正在尝试使用 urllib2 发布具有 UTF-8 字符的表单数据。我想一般来说它与如何在 urllib2 请求中发送 utf-8 内容相同?尽管对此没有有效的答案。我试图通过base64编码只发送一个字节字符串。

Traceback (most recent call last):
  File "load.py", line 165, in <module>
    main()
  File "load.py", line 17, in main
    beers()
  File "load.py", line 157, in beers
    resp = send_post("http://localhost:9000/beers", beer)
  File "load.py", line 64, in send_post
    connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
  File "load.py", line 49, in encode_multipart_data
    lines.extend (encode_field (name))
  File "load.py", line 34, in encode_field
    '', base64.b64encode(u"%s" % data[field_name]))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/base64.py", line 53, in b64encode
    encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)

代码:

def random_string (length):
    return ''.join (random.choice (string.ascii_letters) for ii in range (length + 1))


def encode_multipart_data (data, files):
    boundary = random_string (30)

    def get_content_type (filename):
      return mimetypes.guess_type (filename)[0] or 'application/octet-stream'

    def encode_field (field_name):
      return ('--' + boundary,
              'Content-Disposition: form-data; name="%s"' % field_name,
              'Content-Transfer-Encoding: base64',
              '', base64.b64encode(u"%s" % data[field_name]))

    def encode_file (field_name):
      filename = files [field_name]
      file_size = os.stat(filename).st_size
      file_data = open(filename, 'rb').read()
      file_b64 = base64.b64encode(file_data)
      return ('--' + boundary,
              'Content-Disposition: form-data; name="%s"; filename="%s"' % (field_name, filename),
              'Content-Type: %s' % get_content_type(filename),
              'Content-Transfer-Encoding: base64',
              '', file_b64)

    lines = []
    for name in data:
      lines.extend (encode_field (name))
    for name in files:
      lines.extend (encode_file (name))
    lines.extend (('--%s--' % boundary, ''))
    body = '\r\n'.join (lines)

    headers = {'content-type': 'multipart/form-data; boundary=' + boundary,
               'content-length': str(len(body))}

    return body, headers


def send_post (url, data, files={}):
    req = urllib2.Request (url)
    connection = httplib.HTTPConnection (req.get_host())
    connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
    return connection.getresponse()

啤酒对象的 json 是(这是data传入的encode_multipart_data):

    {
    "name"        : "Yuengling Oktoberfest",
    "brewer"      : "Yuengling Brewery",
    "description" : "America’s Oldest Brewery is proud to offer Yuengling Oktoberfest Beer. Copper in color, this medium bodied beer is the perfect blend of roasted malts with just the right amount of hops to capture a true representation of the style. Enjoy a Yuengling Oktoberfest Beer in celebration of the season, while supplies last!",
    "abv"         : 5.2, 
    "ibu"         : 26, 
    "type"        : "Lager",
    "subtype"     : "",
    "color"       : "",
    "seasonal"    : true,
    "servingTemp" : "Cold",
    "rating"      : 3,
    "inProduction": true  
    }
4

1 回答 1

4

您不能对 Unicode 进行 base64 编码,只能对字节字符串进行编码。在 Python 2.7 中,将 Unicode 字符串提供给需要字节字符串的函数会导致使用ascii编解码器隐式转换为字节字符串,从而导致您看到的错误:

>>> base64.b64encode(u'America\u2019s')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\base64.py", line 53, in b64encode
    encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)

因此,首先使用有效的编码将其编码为字节字符串:

>>> base64.b64encode(u'America\u2019s'.encode('utf8'))
'QW1lcmljYeKAmXM='
于 2013-09-17T06:18:08.227 回答