python - Flask 中的 Celery 任务，用于上传和调整图像大小并将其存储到 Amazon S3

Question

我正在尝试创建一个 celery 任务，用于在将图像存储到 Amazon S3 之前上传和调整图像大小。但它没有按预期工作。没有任务，一切正常。这是到目前为止的代码：

堆栈跟踪

Traceback (most recent call last):
  File "../myVE/lib/python2.7/site-packages/kombu/messaging.py", line 579, in _receive_callback
    decoded = None if on_m else message.decode()
  File "../myVE/lib/python2.7/site-packages/kombu/transport/base.py", line 147, in decode
    self.content_encoding, accept=self.accept)
  File "../myVE/lib/python2.7/site-packages/kombu/serialization.py", line 187, in decode
    return decode(data)
  File "../myVE/lib/python2.7/site-packages/kombu/serialization.py", line 74, in pickle_loads
    return load(BytesIO(s))
  File "../myVE/lib/python2.7/site-packages/werkzeug/datastructures.py", line 2595, in __getattr__
    return getattr(self.stream, name)
  File "../myVE/lib/python2.7/site-packages/werkzeug/datastructures.py", line 2595, in __getattr__
    return getattr(self.stream, name)
    ...
RuntimeError: maximum recursion depth exceeded while calling a Python object

视图.py

from PIL import Image

from flask import Blueprint, redirect, render_template, request, url_for

from myapplication.forms import UploadForm
from myapplication.tasks import upload_task


main = Blueprint('main', __name__)

@main.route('/upload', methods=['GET', 'POST'])
def upload():
    form = UploadForm()
    if form.validate_on_submit():
        upload_task.delay(form.title.data, form.description.data,
                          Image.open(request.files['image']))
        return redirect(url_for('main.index'))
    return render_template('upload.html', form=form)

任务.py

from StringIO import StringIO

from flask import current_app

from myapplication.extensions import celery, db
from myapplication.helpers import resize, s3_upload
from myapplication.models import MyObject


@celery.task(name='tasks.upload_task')
def upload_task(title, description, source):
    stream = StringIO()
    target = resize(source, current_app.config['SIZE'])
    target.save(stream, 'JPEG', quality=95)
    stream.seek(0)
    obj = MyObject(title=title, description=description, url=s3_upload(stream))
    db.session.add(obj)
    db.session.commit()

score 13 · Accepted Answer

我知道这是一个非常古老的问题，但我一直在努力将文件的内容传递给 celery 任务。我会不断收到错误，试图跟随其他人所做的事情。所以我写了这个，希望它可以在未来帮助其他人。

TL;博士

使用 base64 编码将文件内容发送到 celery 任务
解码 celery 任务中的数据并io.BytesIO用于流

长答案

我对将图像保存到磁盘并再次读取它不感兴趣，因此我想传递所需的数据以在后台重建文件。

试图遵循其他人的建议，我不断收到编码错误。一些错误是：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
TypeError: initial_value must be str or None, not bytes

TypeError被io.StringIO. _ 试图对数据进行解码以摆脱UnicodeDecodeError并没有多大意义。由于数据首先是二进制的，所以我尝试使用一个io.BytesIO实例，并且效果很好。我唯一需要做的就是用 base64 对文件的流进行编码，然后我就可以将内容传递给 celery 任务。

代码示例

图像.py

import base64

file_.stream.seek(0) # start from beginning of file
# some of the data may not be defined
data = {
  'stream': base64.b64encode(file_.read()),
  'name': file_.name,
  'filename': file_.filename,
  'content_type': file_.content_type,
  'content_length': file_.content_length,
  'headers': {header[0]: header[1] for header in file_.headers}
}

###
# add logic to sanitize required fields
###

# define the params for the upload (here I am using AWS S3)
bucket, s3_image_path = AWS_S3_BUCKET, AWS_S3_IMAGE_PATH
# import and call the background task
from async_tasks import upload_async_photo 
upload_async_photo.delay(
  data=data,
  image_path=s3_image_path,
  bucket=bucket)

异步任务

import base64, io
from werkzeug.datastructures import FileStorage

@celery.task
def upload_async_photo(data, image_path, bucket):
    bucket = get_s3_bucket(bucket) # get bucket instance
    try:
        # decode the stream
        data['stream'] = base64.b64decode(data['stream'])
        # create a BytesIO instance
        # https://docs.python.org/3/library/io.html#binary-i-o
        data['stream'] = io.BytesIO(data['stream'])
        # create the file structure
        file_ = FileStorage(**data)
        # upload image
        bucket.put_object(
                Body=file_,
                Key=image_path,
                ContentType=data['content_type'])
    except Exception as e:
        print(str(e))

编辑

我还更改了 celery 接受的内容以及它如何序列化数据。为了避免将 Bytes 实例传递给 celery 任务时遇到问题，我必须将以下内容添加到我的配置中：

CELERY_ACCEPT_CONTENT = ['pickle']
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'pickle'

score 5 · Accepted Answer

看起来您正试图将整个上传的文件作为 Celery 消息的一部分传递。我想这会给你带来一些麻烦。我建议您查看是否可以将文件作为视图的一部分保存到 Web 服务器，然后让消息（“延迟”参数）包含文件名而不是整个文件的数据。然后任务可以从硬盘读取文件，上传到s3，然后在本地删除。

score 1 · Accepted Answer

我知道这是一个非常古老的帖子，但以防万一它对某人有所帮助 - 在这种情况下，最好的方法是从外部源下载图像，然后执行异步操作。

按照@Obeyed的建议修复序列化问题后，我可能会遇到类似的异步问题（虽然不需要更改芹菜配置），但我最终放弃了解决方案，因为文件内容可能非常大并且消耗大量消息代理中的资源。

如果您想将异步任务委托给工作机器， @Mark Hildreth 的方法不会很有帮助。

在这种情况下，也许更好的方法是同步上传原始图像，然后异步下载、调整大小并重新上传图像以替换原始图像。

score 0 · Accepted Answer

老问题，但我刚刚遇到了同样的问题。接受的答案对我不起作用（我正在使用 Docker 实例，因此 Celery 无法访问生产者文件系统。此外，首先将文件保存到本地文件系统的速度很慢）。

我的解决方案将文件保存在 RAM 中。因此速度要快得多。唯一的缺点是如果您需要处理大文件（>1GB），那么您需要一台具有大量 RAM 的服务器。

doc_file 是类型werkzeug.datastructure.FileStorage（请参阅此处的文档）

将文件发送给 celery worker：

entry.delay(doc_file.read(), doc_file.filename, doc_file.name, doc_file.content_length, doc_file.content_type, doc_file.headers)

接收文件：

from werkzeug.datastructures import FileStorage
from StringIO import StringIO

@celery.task()
def entry(stream, filename, name, content_length, content_type, headers):
    doc = FileStorage(stream=StringIO(stream), filename=filename, name=name, content_type=content_type, content_length=content_length)
    # Do something with the file (e.g save to Amazon S3)

python - Flask 中的 Celery 任务，用于上传和调整图像大小并将其存储到 Amazon S3

4 回答 4

TL;博士

长答案

代码示例

编辑

Related

Reference