使用该命令将文件从 Linux 系统传输到 Google Cloud Platformgsutil cp
时,在尝试处理包含未编码的非英文字符的内容(不仅仅是文件名!)时,它会在一些旧的“.eml”文件中失败统一码。
尝试的命令是:
gsutil cp "/home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml" gs://darsen_backup_monthly/
错误消息是:
UnicodeEncodeError: 'ascii' codec can't encode character '\udca8' in position 22881: ordinal not in range(128)
gsutil rsync
给出了一个非常相似的错误。位置 22881 (0x5961) 位于多部分电子邮件源文件的末尾。以下显示了十六进制转储文件内容:
00005960: 20a8 43a4 d1b3 a320 5961 686f 6f21 a95f .C.... Yahoo!._
00005970: bcaf 203e 2020 7777 772e 7961 686f 6f2e .. > www.yahoo.
00005980: 636f 6d2e 7477 0d0a com.tw..
我们在位置 0x5961 处看到字节“0xa8”,如错误消息所示,这是问题的根源。由于某种原因gsutil
试图对文本进行编码。在支持中文字符的终端中打开文件时,我们会看到:
< 每天都 Yahoo!奇摩 > www.yahoo.com.tw
Big-5编码时,第一个汉字“-”是0xa843。一个简单的解决方法是将文件扩展名重命名为“.eml”以外的其他名称,例如“.eml.bak”,这样gsutil
就不会处理文件内容。遗憾的是,在批量传输时很难提前知道是否存在这种非英文字符的文件,并且整个过程可以多次停止。
以下是完整的错误消息:
darsenlu@devmodel:~/Home$ gsutil cp "/home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml" gs://darsen_backup_monthly/
Copying file:///home/darsenlu/Home/mail/Pan/Fw_ japanese_lyrics.eml [Content-Type=message/rfc822]...
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
gsutil.RunMain()
File "/usr/lib/google-cloud-sdk/platform/gsutil/gsutil.py", line 122, in RunMain
sys.exit(gslib.__main__.main())
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 444, in main
user_project=user_project)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 780, in _RunNamedCommandAndHandleExceptions
_HandleUnknownFailure(e)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 639, in _RunNamedCommandAndHandleExceptions
user_project=user_project)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 411, in RunNamedCommand
return_code = command_inst.RunCommand()
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 1124, in RunCommand
seek_ahead_iterator=seek_ahead_iterator)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1525, in Apply
arg_checker, should_return_results, fail_on_error)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1596, in _SequentialApply
worker_thread.PerformTask(task, self)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2316, in PerformTask
results = task.func(cls, task.args, thread_state=self.thread_gsutil_api)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 709, in _CopyFuncWrapper
preserve_posix=cls.preserve_posix_attrs)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 924, in CopyFunc
preserve_posix=preserve_posix)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 3957, in PerformCopy
gzip_encoded=gzip_encoded)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 2250, in _UploadFileToObject
parallel_composite_upload, logger)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 2066, in _DelegateUploadFileToObject
elapsed_time, uploaded_object = upload_delegate()
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 2227, in CallNonResumableUpload
gzip_encoded=gzip_encoded_file)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 1762, in _UploadFileToObjectNonResumable
gzip_encoded=gzip_encoded)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 388, in UploadObject
gzip_encoded=gzip_encoded)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1712, in UploadObject
gzip_encoded=gzip_encoded)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1534, in _UploadObject
global_params=global_params)
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py", line 1182, in Insert
upload=upload, upload_config=upload_config)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 703, in _RunMethod
download)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 679, in PrepareHttpRequest
upload.ConfigureRequest(upload_config, http_request, url_builder)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 763, in ConfigureRequest
self.__ConfigureMultipartRequest(http_request)
File "/usr/lib/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 823, in __ConfigureMultipartRequest
g.flatten(msg_root, unixfrom=False)
File "/usr/lib/python3.6/email/generator.py", line 116, in flatten
self._write(msg)
File "/usr/lib/python3.6/email/generator.py", line 181, in _write
self._dispatch(msg)
File "/usr/lib/python3.6/email/generator.py", line 214, in _dispatch
meth(msg)
File "/usr/lib/python3.6/email/generator.py", line 272, in _handle_multipart
g.flatten(part, unixfrom=False, linesep=self._NL)
File "/usr/lib/python3.6/email/generator.py", line 116, in flatten
self._write(msg)
File "/usr/lib/python3.6/email/generator.py", line 181, in _write
self._dispatch(msg)
File "/usr/lib/python3.6/email/generator.py", line 214, in _dispatch
meth(msg)
File "/usr/lib/python3.6/email/generator.py", line 361, in _handle_message
payload = self._encode(payload)
File "/usr/lib/python3.6/email/generator.py", line 412, in _encode
return s.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\udca8' in position 22881: ordinal not in range(128)
Linux 系统是 Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-76-generic x86_64)。