3

我正在尝试使用 django ORM 的聚合功能在 MSSQL 2008R2 数据库上运行查询,但我不断收到超时错误。失败的查询(由 django 生成)如下。我试过运行它来指导 SQL 管理工作室,它可以工作,但需要 3.5 分钟

它看起来确实聚合了一堆不需要的字段,但我不会认为这真的会导致它花费那么长时间。数据库也没有那么大,auth_user有 9 条记录,ticket_ticket有 1210条,ticket_watchers还有 1876 条。有什么我遗漏的吗?

SELECT 
    [auth_user].[id], 
    [auth_user].[password], 
    [auth_user].[last_login], 
    [auth_user].[is_superuser], 
    [auth_user].[username], 
    [auth_user].[first_name], 
    [auth_user].[last_name], 
    [auth_user].[email], 
    [auth_user].[is_staff], 
    [auth_user].[is_active], 
    [auth_user].[date_joined], 
    COUNT([tickets_ticket].[id]) AS [tickets_captured__count], 
    COUNT(T3.[id]) AS [assigned_tickets__count], 
    COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count] 
FROM 
    [auth_user] 
    LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) 
    LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) 
    LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) 
GROUP BY 
    [auth_user].[id], 
    [auth_user].[password], 
    [auth_user].[last_login], 
    [auth_user].[is_superuser], 
    [auth_user].[username], 
    [auth_user].[first_name], 
    [auth_user].[last_name], 
    [auth_user].[email], 
    [auth_user].[is_staff], 
    [auth_user].[is_active], 
    [auth_user].[date_joined] 
HAVING 
    (COUNT([tickets_ticket].[id]) > 0  OR COUNT(T3.[id]) > 0 )

编辑:

以下是相关索引(不包括查询中未使用的索引):

auth_user.id                       (PK)
auth_user.username                 (Unique)
tickets_ticket.id                  (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id         (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id

编辑2:

经过一些实验,我发现以下查询是导致执行缓慢的最小查询:

SELECT 
    COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
    COUNT(T3.[id]) AS [assigned_tickets__count],
    COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM 
    [auth_user] 
    LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id]) 
    LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id]) 
    LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id]) 
GROUP BY 
    [auth_user].[id]

奇怪的是,如果我注释掉上面的任何两行,它会在不到 1 秒的时间内运行,但我删除哪些行似乎并不重要(尽管显然我不能删除连接而不删除相关选择线)。

编辑 3:

生成这个的python代码是:

User.objects.annotate(
    Count('tickets_captured'), 
    Count('assigned_tickets'), 
    Count('tickets_watched')
)

查看执行计划可以看出,SQL Server 首先对所有表进行交叉连接,产生了大约 2.8 亿行和 6Gb 的数据。我认为这就是问题所在,但为什么会这样呢?

4

2 回答 2

1

SQL Server 正在做它被要求做的事情。不幸的是,Django 没有为你想要的生成正确的查询。看起来你需要计算不同的,而不是仅仅计算:Django annotate() 多次导致错误答案

至于为什么查询会这样工作:查询说要将四个表连接在一起。假设一个作者有 2 张捕获票、3 张分配票和 4 张观看票,加入将返回 2*3*4 票,每种票组合一张。不同的部分将删除所有重复项。

于 2013-06-28T15:09:36.347 回答
0

那这个呢?

SELECT auth_user.*, 
   C1.tickets_captured__count
   C2.assigned_tickets__count
   C3.tickets_watched__count

FROM 
auth_user
LEFT JOIN
( SELECT  capturer_id, COUNT(*) AS tickets_captured__count 
  FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT  responsible_id, COUNT(*) AS assigned_tickets__count 
  FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT  user_id, COUNT(*) AS tickets_watched__count 
  FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id

WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null   -- also works (I think with beter performance)
于 2013-06-28T14:14:07.010 回答