0

有没有人成功地将大型数据存储类型备份到云存储?这是一项实验性功能,因此在 google 端的支持非常粗略。

我们想要备份到云存储(最终的目标是从云存储中提取到大查询中)的问题目前大小为 1.2 TB。

- description: BackUp
  url: /_ah/datastore_admin/backup.create?name=OurApp&filesystem=gs&gs_bucket_name=OurBucket&queue=backup&kind=LargeKind
  schedule: every day 00:00
  timezone: America/Regina
  target: ah-builtin-python-bundle

我们不断遇到以下错误消息:

Traceback (most recent call last):
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 182, in handle
    input_reader, shard_state, tstate, quota_consumer, ctx)
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 263, in process_inputs
    entity, input_reader, ctx, transient_shard_state):
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/handlers.py", line 318, in process_data
    output_writer.write(output, ctx)
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 711, in write
    ctx.get_pool("file_pool").append(self._filename, str(data))
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 266, in append
    self.flush()
  File "/base/data/home/apps/s~steprep-prod-hrd/prod-339.366560204640641232/lib/mapreduce/output_writers.py", line 288, in flush
    f.write(data)
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 297, in __exit__
    self.close()
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 291, in close
    self._make_rpc_call_with_retry('Close', request, response)
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 427, in _make_rpc_call_with_retry
    _make_call(method, request, response)
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 250, in _make_call
    rpc.check_success()
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 570, in check_success
    self.__rpc.CheckSuccess()
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 133, in CheckSuccess
    raise self.exception
DeadlineExceededError: The API call file.Close() took too long to respond and was cancelled.
4

1 回答 1

1

从 gae 到云存储的写入操作似乎有 30 秒的未记录时间限制。这也适用于在后端进行的写入操作,因此您可以从云存储中的 gae 创建的最大文件大小取决于您的吞吐量。我们的解决方案是拆分文件;每次 writer-task 接近 20 秒时,它会关闭当前文件并打开一个新文件,然后我们在本地加入这些文件。对我们来说,这会产生大约 500KB(压缩)的文件,因此这可能不是您可以接受的解决方案...

于 2013-04-12T09:53:00.987 回答