0

使用该命令将文件从 Linux 系统传输到 Google Cloud Platformgsutil cp时,在尝试处理包含未编码的非英文字符的内容(不仅仅是文件名!)时,它会在一些旧的“.eml”文件中失败统一码。

尝试的命令是:

gsutil cp "/home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml" gs://darsen_backup_monthly/

错误消息是:

UnicodeEncodeError: 'ascii' codec can't encode character '\udca8' in position 22881: ordinal not in range(128)

gsutil rsync给出了一个非常相似的错误。位置 22881 (0x5961) 位于多部分电子邮件源文件的末尾。以下显示了十六进制转储文件内容:

00005960: 20a8 43a4 d1b3 a320 5961 686f 6f21 a95f   .C.... Yahoo!._
00005970: bcaf 203e 2020 7777 772e 7961 686f 6f2e  .. >  www.yahoo.
00005980: 636f 6d2e 7477 0d0a                      com.tw..

我们在位置 0x5961 处看到字节“0xa8”,如错误消息所示,这是问题的根源。由于某种原因gsutil试图对文本进行编码。在支持中文字符的终端中打开文件时,我们会看到:

< 每天都 Yahoo!奇摩 >  www.yahoo.com.tw

Big-5编码时,第一个汉字“-”是0xa843。一个简单的解决方法是将文件扩展名重命名为“.eml”以外的其他名称,例如“.eml.bak”,这样gsutil就不会处理文件内容。遗憾的是,在批量传输时很难提前知道是否存在这种非英文字符的文件,并且整个过程可以多次停止。

以下是完整的错误消息:

darsenlu@devmodel:~/Home$ gsutil cp "/home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml" gs://darsen_backup_monthly/
Copying file:///home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml [Content-Type=message/rfc822]...
Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
    gsutil.RunMain()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil.py", line 122, in RunMain
    sys.exit(gslib.__main__.main())
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 444, in main
    user_project=user_project)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 780, in _RunNamedCommandAndHandleExceptions
    _HandleUnknownFailure(e)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 639, in _RunNamedCommandAndHandleExceptions
    user_project=user_project)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 411, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 1124, in RunCommand
    seek_ahead_iterator=seek_ahead_iterator)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1525, in Apply
    arg_checker, should_return_results, fail_on_error)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1596, in _SequentialApply
    worker_thread.PerformTask(task, self)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2316, in PerformTask
    results = task.func(cls, task.args, thread_state=self.thread_gsutil_api)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 709, in _CopyFuncWrapper
    preserve_posix=cls.preserve_posix_attrs)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 924, in CopyFunc
    preserve_posix=preserve_posix)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 3957, in PerformCopy
    gzip_encoded=gzip_encoded)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 2250, in _UploadFileToObject
    parallel_composite_upload, logger)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 2066, in _DelegateUploadFileToObject
    elapsed_time, uploaded_object = upload_delegate()
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 2227, in CallNonResumableUpload
    gzip_encoded=gzip_encoded_file)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 1762, in _UploadFileToObjectNonResumable
    gzip_encoded=gzip_encoded)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 388, in UploadObject
    gzip_encoded=gzip_encoded)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1712, in UploadObject
    gzip_encoded=gzip_encoded)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1534, in _UploadObject
    global_params=global_params)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py", line 1182, in Insert
    upload=upload, upload_config=upload_config)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 703, in _RunMethod
    download)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 679, in PrepareHttpRequest
    upload.ConfigureRequest(upload_config, http_request, url_builder)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 763, in ConfigureRequest
    self.__ConfigureMultipartRequest(http_request)
  File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 823, in __ConfigureMultipartRequest
    g.flatten(msg_root, unixfrom=False)
  File "/usr/lib/python3.6/email/generator.py", line 116, in flatten
    self._write(msg)
  File "/usr/lib/python3.6/email/generator.py", line 181, in _write
    self._dispatch(msg)
  File "/usr/lib/python3.6/email/generator.py", line 214, in _dispatch
    meth(msg)
  File "/usr/lib/python3.6/email/generator.py", line 272, in _handle_multipart
    g.flatten(part, unixfrom=False, linesep=self._NL)
  File "/usr/lib/python3.6/email/generator.py", line 116, in flatten
    self._write(msg)
  File "/usr/lib/python3.6/email/generator.py", line 181, in _write
    self._dispatch(msg)
  File "/usr/lib/python3.6/email/generator.py", line 214, in _dispatch
    meth(msg)
  File "/usr/lib/python3.6/email/generator.py", line 361, in _handle_message
    payload = self._encode(payload)
  File "/usr/lib/python3.6/email/generator.py", line 412, in _encode
    return s.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\udca8' in position 22881: ordinal not in range(128)

Linux 系统是 Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-76-generic x86_64)。

4

1 回答 1

2

我把你的字符串用中文字符,并且能够重现你的错误。我在更新到gsutil 4.62. 这是合并的 PR问题跟踪器作为参考。

通过运行更新 Cloud SDK:

gcloud components update
于 2021-05-20T00:50:27.537 回答