python - Django中的聚合和过滤

Question

我有这张桌子：

**ID    val1     val2**
1      5         6
2      6         4
3      3         1
4      8         4
5      2         6
6      8         2

在 Django 中使用过滤器查询，我想总结这些数据，以便获得每 n 条记录上 val1 和 val2 的平均值。例如，如果 n=3 则：

**ID     val1     val2**
1      4.7      3.7
2      6.0      4.0

score 0 · Accepted Answer

你必须先定义你的模型，然后你可以使用Count, Min, Sum, Avgfromdjango.db.models那会让你做：

Table.objects.aggregate(average_val1=Avg('table__val1'))

结果字典将有一个名为“average_val1”的键。如果没有指定这样的别名，它将是相当长的“table__val1__avg”。

更多文档可以在这里找到：Django 文档 - 聚合

注意：您可以过滤然后使用聚合对特定集合执行操作。蛋：

Table.objects.filter( pk__in=id_list ).aggregate(average_val1=Avg('table__val1'))

或者您可以每次确定每个 n 的限制 id [id1, id2] 然后执行以下操作：

Table.objects.filter( pk__lte=n1, pk__gte=n2).aggregate(average_val1=Avg('table__val1'))

将__in, __lte, __gte确保仅过滤您想要的 id 集，然后您可以在该集上进行聚合。

__in: 在列表中, __lte: 小于或等于和__gte: 大于或等于。

score 0 · Accepted Answer

ind = 0
v1ofn = 0
v2ofn = 0
for row in tname.objects.all():
    if ind >= n:
        result_list.append([v1ofn/3, v2ofn/3])
        v1ofn = row.val1
        v2ofn = row.val2
        ind = 0
    else:
        v1ofn = v1ofn + row.val1
        v2ofn = v2ofn + row.val2
    ind = ind + 1

假设表有 3 个项目的倍数，如果没有，则在循环后执行一些额外的逻辑来处理额外的项目。

score 0 · Accepted Answer

我猜聚合是 Django 执行此操作的方式，但建议的示例会产生如 Behesti 所说的大量查询。我的猜测是，django ORM 真的不是为数字运算而构建的（但我在这里可能错了！）

我可能会选择 numpy （如果你真的有巨大的数组，我想你需要做一些分区）：

使用 numpy 的好处是它通常比“标准”python 操作快得多，但坏处是它的额外依赖性。

import numpy
raw_array = [ # This is for testing the code, use .values_list( 'val1', 'val2' ) with db
[1   ,   5      ,   6],
[2   ,   6      ,   4],
[3   ,   3      ,   1],
[4   ,   8      ,   4],
[5   ,   2      ,   6],
[6   ,   8      ,   2],
[7   ,   1      ,   1],
]

arr = numpy.array( raw_array )

def sum_up_every_n_items( arr, n ):
   res = numpy.zeros( (numpy.floor( arr.shape[0]/float(n) ), arr.shape[1]) )
   arr = arr[ 0:res.shape[0]*n, : ] # Truncate, take only full N items
   for loop in range(0,n): # Note: this is loop 0,1,2 if n=3 ! We do addition with numpy vectors!
      res = res + arr[ loop::n, : ] # Get every n'th row from starting with offset
   res = res / float(n)
   return res 

res = sum_up_every_n_items( arr, n=3 )
print res

输出

[[ 2.          4.66666667  3.66666667]
 [ 5.          6.          4.        ]]

score 0 · Accepted Answer

避免做很多查询。将数据拉下一次，然后在 Python 中完成其余的工作；

n = 3 #yourstep
results = Model.objects.filter(query).only('id', 'val1', 'val2')
avgs = {}
for i in xrange(0, len(results), n): # unsure, but count may end up in 2 queries, and you're going to evaluate the queryset anyway
    avg1, avg2 = 0, 0
    for j in xrange(n):
        avg1 += results[i+j].val1/float(n)
        avg2 += results[i+j].val2/float(n)
    avgs[results[i].id] = (avg1, avg2) # Why is id relevant at all here?

python - Django中的聚合和过滤

4 回答 4

Related

Reference