query-optimization - 关于优化慢查询的问题（包括 SQL）

Question

SELECT DISTINCT "myapp_profile"."user_id", "myapp_profile"."name", 
  "myapp_profile"."age", "auth_user"."id", "auth_user"."username", 
  "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", 
  "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", 
  "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" 
FROM "myapp_profile" 
INNER JOIN "auth_user" ON ("myapp_profile"."user_id" = "auth_user"."id") 
LEFT OUTER JOIN "myapp_siterel" ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
LEFT OUTER JOIN "django_site" ON ("myapp_siterel"."site_id" = "django_site"."id") 
WHERE ("auth_user"."is_superuser" = false 
AND "auth_user"."is_staff" = false 
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))) 
ORDER BY "myapp_profile"."user_id" 
DESC LIMIT 100

上面的查询需要大约 100 秒才能运行 200 万用户/配置文件。我不是 DBA，我们的 DBA 正在查看情况以了解可以做什么，但由于我可能永远无法看到发生了什么变化（假设它发生在 DB 级别），我很好奇您如何优化这个查询。它显然需要比它发生的速度快一吨，比如大约 5 秒或更短。如果没有办法优化 SQL，是否有一个或多个索引可以添加/更改以使查询更快，或者还有什么我忽略的东西？

Postgres 9 是数据库，而 Django 的 ORM 是该查询的来源。

查询计划

Limit (cost=1374.35..1383.10 rows=100 width=106)
-> Unique (cost=1374.35..1391.24 rows=193 width=106)
-> Sort (cost=1374.35..1374.83 rows=193 width=106)
Sort Key: myapp_profile.user_id, myapp_profile.name, myapp_profile.age, auth_user.username, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.is_staff, auth_user.is_active, auth_user.is_superuser, auth_user.last_login, auth_user.date_joined
-> Nested Loop (cost=453.99..1367.02 rows=193 width=106)
-> Hash Left Join (cost=453.99..1302.53 rows=193 width=49)
Hash Cond: (myapp_siterel.site_id = django_site.id)
Filter: ((django_site.id IS NULL) OR (django_site.id = ANY ('{10080,10053}'::integer[])))
-> Hash Left Join (cost=448.50..1053.27 rows=15001 width=53)
Hash Cond: (myapp_profile.user_id = myapp_siterel.profile_id)
-> Seq Scan on myapp_profile (cost=0.00..286.01 rows=15001 width=49)
-> Hash (cost=261.00..261.00 rows=15000 width=8)
-> Seq Scan on myapp_siterel (cost=0.00..261.00 rows=15000 width=8)
-> Hash (cost=3.55..3.55 rows=155 width=4)
-> Seq Scan on django_site (cost=0.00..3.55 rows=155 width=4)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..0.32 rows=1 width=57)
Index Cond: (auth_user.id = myapp_profile.user_id)
Filter: ((NOT auth_user.is_superuser) AND (NOT auth_user.is_staff))

谢谢

score 2 · Accepted Answer

我对 postgres 不太熟悉，所以我不确定它的查询优化器有多好，但看起来你在 where 子句中的所有内容都可能是连接条件，尽管我希望 postgres 足够聪明以工作这本身就是一个问题，但是如果不是，那么它将获取所有 200 万用户以及其他 3 个表中的相关记录，然后使用你的 where 进行过滤。

如果它们不存在，已经提到的索引也应该对您有用。同样，我更像是一个 MSSQL 人，但 postgres 没有你可以看到的任何统计配置文件或查询计划吗？

沿着这些思路

SELECT DISTINCT
    "myapp_profile"."user_id",
    "myapp_profile"."name", 
    "myapp_profile"."age",
    "auth_user"."id",
    "auth_user"."username", 
    "auth_user"."first_name",
    "auth_user"."last_name",
    "auth_user"."email", 
    "auth_user"."password",
    "auth_user"."is_staff",
    "auth_user"."is_active", 
    "auth_user"."is_superuser",
    "auth_user"."last_login",
    "auth_user"."date_joined" 
FROM "myapp_profile" 
    INNER JOIN "auth_user"
        ON ("myapp_profile"."user_id" = "auth_user"."id") 
        AND "auth_user"."is_superuser" = false
        AND "auth_user"."is_staff" = false 
    LEFT OUTER JOIN "myapp_siterel"
        ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
    LEFT OUTER JOIN "django_site"
        ON ("myapp_siterel"."site_id" = "django_site"."id") 
        AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))
ORDER BY "myapp_profile"."user_id" DESC
LIMIT 100

另外，你需要独特的吗？这也会稍微减慢它的速度。

score 1 · Accepted Answer

基础知识：

确保所有用户 ID 字段都已编入索引。

看起来你会很好地使用 is_supervisor 和 is_staff 上的索引

score 1 · Accepted Answer

查询优化从来没有一个直截了当的灵丹妙药解决方案，但是，显而易见的步骤是索引您正在搜索的列，在您的情况下，那就是：

"auth_user"."is_superuser"
"auth_user"."is_staff"
"django_site"."id"
"myapp_profile"."user_id"

query-optimization - 关于优化慢查询的问题（包括 SQL）

查询计划

3 回答 3

Related

Reference