3

我写了以下函数:

def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
    start = 0
    rating = lowest_rating
    ids = assessment_entries_qs.values_list('id', flat=True)

    for i in ids: # I have absolutely no idea why this seems to be required:
        pass      # without this loop, the last AssessmentEntries fail to update 
                  # in the following for loop.

    for amount in amounts:
        end_mark = start + amount
        entries = ids[start:end_mark]
        a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
        start = end_mark
        rating += 1

它会做它应该做的事情(即更新相关的条目数量,assessment_entries_qs每个评分(从 开始lowest_rating),如 中指定的那样amounts)。这是一个简单的例子:

>>> assessment_entries = AssessmentEntry.objects.all()
>>> print [ae.rating for ae in assessment_entries]
[None, None, None, None, None, None, None, None, None, None]
>>>
>>> auto_update_ratings((2,4,3,1), assessment_entries, 1)
>>> print [ae.rating for ae in assessment_entries]
[1, 1, 2, 2, 2, 2, 3, 3, 3, 4]

但是,如果我在遍历ids之前不遍历amounts,该函数只会更新查询集的一个子集:使用我当前的测试数据(查询集中大约 250 个AssessmentEntries),它总是导致正好 84个AssessmentEntries没有被更新。

有趣的是,它始终是第二个 for 循环的最后一次迭代,它不会导致任何更新(尽管该迭代中的其余代码确实执行正确),以及前一次迭代的一部分。查询集在传递给此函数之前是ordered_by('?'),如果我简单地添加前面的“空”for循环,就可以达到预期的结果,所以我的数据似乎没有问题)。

更多细节,以防万一它们被证明是相关的:

  • AssessmentEntry.rating是一个标准IntegerField(null=True,blank=True)
  • 我使用这个函数纯粹是为了测试目的,所以我只从 iPython 执行它。
  • 测试数据库是 SQLite。

问题:有人可以解释为什么我似乎需要迭代ids,尽管实际上没有以任何方式接触数据,以及为什么不这样做,函数仍然(有点)正确执行,但总是无法更新最后几项尽管显然仍在迭代它们,但查询集?

4

1 回答 1

4

QuerySets 和 QuerySet 切片是惰性求值的。迭代 ids 执行查询并使ids行为类似于静态列表而不是 QuerySet。因此,当您遍历 时ids,它会导致entries稍后成为一组固定的值;但是如果你循环ids,那么entries只是一个子查询,LIMIT添加了一个子句来表示你所做的切片。

以下是详细情况:

def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
    # assessment_entries_qs is an unevaluated QuerySet
    # from your calling code, it would probably generate a query like this:
    # SELECT * FROM assessments ORDER BY RANDOM()
    start = 0
    rating = lowest_rating
    ids = assessment_entries_qs.values_list('id', flat=True)
    # ids is a ValueQuerySet that adds "SELECT id"
    # to the query that assessment_entries_qs would generate.
    # So ids is now something like:
    # SELECT id FROM assessments ORDER BY RANDOM()

    # we omit the loop

    for amount in amounts:
        end_mark = start + amount
        entries = ids[start:end_mark]
        # entries is now another QuerySet with a LIMIT clause added:
        # SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark)
        # When filter() gets a QuerySet, it adds a subquery
        a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
        # FINALLY, we now actually EXECUTE a query which is something like this:
        # UPDATE assessments SET rating=? WHERE id IN 
        # (SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark))
        start = end_mark
        rating += 1

由于每次插入时entries都会执行子查询并且它具有随机顺序,因此您所做的切片是没有意义的!此函数没有确定性行为。

但是,当您迭代 ids 时,您实际上会执行查询,因此您的切片再次具有确定性行为,并且代码会执行您期望的操作。

让我们看看当你使用循环时会发生什么:

ids = assessment_entries_qs.values_list('id', flat=True)

# Iterating ids causes the query to actually be executed
# This query was sent to the DB:
# SELECT id FROM assessments ORDER BY RANDOM()
for id in ids:
    pass

# ids has now been "realized" and contains the *results* of the query
# e.g., [5,1,2,3,4]
# Iterating again (or slicing) will now return values rather than modify the query

for amount in amounts:
    end_mark = start + amount
    entries = ids[start:end_mark]
    # because ids was executed, entries contains definite values
    # When filter() gets actual values, it adds a simple condition
    a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
    # The query executed is something like this:
    # UPDATE assessments SET rating=? WHERE id IN (5,1)
    # "(5,1)" will change on each iteration, but it will always be a set of
    # scalar values rather than a subquery.
    start = end_mark
    rating += 1

如果您需要急切地评估一个 QuerySet 以立即获取其所有值,而不是执行无操作迭代,只需将其转换为列表:

    ids = list(assessment_entries_qs.values_list('id', flat=True))

Django 文档还详细介绍了何时QuerySet评估a 。

于 2013-02-17T00:12:52.917 回答