Martijn's solution is great but it forces you to use the file in a context manager (you can't do out = s3upload(…)
and print >> out, "Hello"
). The following solution works similarly (in-memory storage up until a certain size), but works both as a context manager and as a regular file (you can do both with S3WriteFile(…)
and out = S3WriteFile(…); print >> out, "Hello"; out.close()
):
import tempfile
import os
class S3WriteFile(object):
"""
File-like context manager that can be written to (and read from),
and which is automatically copied to Amazon S3 upon closing and deletion.
"""
def __init__(self, item, max_size=10*1024**2):
"""
item -- boto.s3.key.Key for writing the file (upon closing). The
name of the object is set to the item's name (key).
max_size -- maximum size in bytes of the data that will be
kept in memory when writing to the file. If more data is
written, it is automatically rolled over to a file.
"""
self.item = item
temp_file = tempfile.SpooledTemporaryFile(max_size)
# It would be useless to set the .name attribute of the
# object: when using it as a context manager, the temporary
# file is returned, which as a None name:
temp_file.name = os.path.join(
"s3://{}".format(item.bucket.name),
item.name if item.name is not None else "<???>")
self.temp_file = temp_file
def close(self):
self.temp_file.seek(0)
self.item.set_contents_from_file(self.temp_file)
self.temp_file.close()
def __del__(self):
"""
Write the file contents to S3.
"""
# The file may have been closed before being deleted:
if not self.temp_file.closed:
self.close()
def __enter__(self):
return self.temp_file
def __exit__(self, *args, **kwargs):
self.close()
return False
def __getattr__(self, name):
"""
Everything not specific to this class is delegated to the
temporary file, so that objects of this class behave like a
file.
"""
return getattr(self.temp_file, name)
(Implementation note: instead of delegating many things to self.temp_file
so that the resulting class behaves like a file, inheriting from SpooledTemporaryFile
would in principle work. However, this is an old-style class, so __new__()
is not called, and, as far as I can see, a non-default in-memory size for the temporary data cannot be set.)