django - 如何序列化二进制文件以用于芹菜任务

Question

我最近在我的一个应用程序中集成了celery（更具体地说是django-celery ）。我在应用程序中有一个模型如下。

class UserUploadedFile(models.Model)
    original_file = models.FileField(upload_to='/uploads/')    
    txt = models.FileField(upload_to='/uploads/')
    pdf = models.FileField(upload_to='/uploads/')
    doc = models.FileField(upload_to='/uploads/')
    
    def convert_to_others(self):
        # Code to convert the original file to other formats

现在，一旦用户上传文件，我想将原始文件转换为 txt、pdf 和 doc 格式。调用该convert_to_others方法是一个有点昂贵的过程，所以我计划使用 celery 异步执行它。所以我写了一个简单的芹菜任务如下。

@celery.task(default_retry_delay=bdev.settings.TASK_RETRY_DELAY)
def convert_ufile(file, request):
    """ 
    This task method would call a UserUploadedFile object's convert_to_others
    method to do the file conversions.

    The best way to call this task would be doing it asynchronously
    using apply_async method.
    """
    try:
        file.convert_to_others()
    except Exception, err:
        # If the task fails log the exception and retry in 30 secs
        log.LoggingMiddleware.log_exception(request, err)
        convert_ufile.retry(exc=err)
    return True

然后调用任务如下：

ufile = get_object_or_404(models.UserUploadedFiles, pk=id)
tasks.convert_ufile.apply_async(args=[ufile, request])

现在，当apply_async调用该方法时，它会引发以下异常：

PicklingError: Can't pickle <type 'cStringIO.StringO'>: attribute lookup cStringIO.StringO failed

我认为这是因为 celery（默认情况下）使用pickle库来序列化数据，而 pickle 无法序列化二进制文件。

问题

是否有任何其他序列化程序可以自行序列化二进制文件？如果不是，我如何使用默认pickle序列化程序序列化二进制文件？

score 7 · Accepted Answer

芹菜尝试腌制不支持腌制的数据是正确的。即使您能找到一种方法来序列化要发送到 celery 任务的数据，我也不会这样做。

向 celery 任务发送尽可能少的数据总是一个好主意，所以在你的情况下，我只会传递UserUploadedFile实例的 id。有了这个，你可以在 celery 任务中通过 id 获取你的对象并执行convert_to_others()。

另请注意，在执行任务之前，对象可能会更改其状态（甚至可能会被删除）。因此，在 celery 任务中获取对象比发送其完整副本更安全。

总而言之，只发送一个实例 id 并在任务中重新获取它会给你一些东西：

您向队列发送的数据更少。
您不必处理数据不一致问题。
在您的情况下，这实际上是可能的。:)

唯一的“缺点”是您需要执行一个额外的、廉价的 SELECT 查询来重新获取您的数据，与上述问题相比，这总体上看起来很划算，不是吗？

django - 如何序列化二进制文件以用于芹菜任务

问题

1 回答 1

Related

Reference