86

我有两个这样的模型:

class User(models.Model):
    email = models.EmailField()

class Report(models.Model):
    user = models.ForeignKey(User)

实际上,每个模型都有更多与这个问题无关的字段。

我想过滤所有拥有以“a”开头的电子邮件但没有报告的用户。将有更多基于其他领域的标准.filter().exclude()

我想这样处理它:

users = User.objects.filter(email__like = 'a%')

users = users.filter(<other filters>)

users = ???

我想 ???过滤掉没有与之关联的报告的用户。我该怎么做?如果这不可能,因为我已经提出了它,那么另一种方法是什么?

4

7 回答 7

114

注意:这个答案是在 2013 年为 Django 1.5 编写的。请参阅其他答案以了解适用于较新版本 Django 的更好方法

使用isnull.

users_without_reports = User.objects.filter(report__isnull=True)
users_with_reports = User.objects.filter(report__isnull=False).distinct()

使用 时isnull=Falsedistinct()需要 以防止重复结果。

于 2013-02-12T11:58:26.627 回答
80

从 Django 3.0 开始,您现在可以直接在 a 中使用表达式filter(),删除不必要的 SQL 子句:

User.objects.filter(
    ~Exists(Reports.objects.filter(user__eq=OuterRef('pk'))),
    email__startswith='a'
)
SELECT user.pk, user.email
FROM user
WHERE NOT EXISTS (SELECT U0.pk FROM reports U0 WHERE U0.user = user.pk) AND email LIKE 'a%';

文件:


对于 Django 1.11+,您可以添加EXISTS子查询:

User.objects.annotate(
    no_reports=~Exists(Reports.objects.filter(user__eq=OuterRef('pk')))
).filter(
    email__startswith='a',
    no_reports=True
)

这会生成类似这样的 SQL:

SELECT
    user.pk,
    user.email,
    NOT EXISTS (SELECT U0.pk FROM reports U0 WHERE U0.user = user.pk) AS no_reports
FROM user
WHERE email LIKE 'a%' AND NOT EXISTS (SELECT U0.pk FROM reports U0 WHERE U0.user = user.pk);

NOT EXISTS子句几乎总是执行“不存在”过滤器的最有效方式。


于 2018-08-16T14:24:32.077 回答
12

在没有额外查询或联接的情况下获得本机 SQL EXISTS/NOT EXISTS 的唯一方法是将其作为原始 SQL 添加到 .extra() 子句中:

users = users.extra(where=[
    """NOT EXISTS(SELECT 1 FROM {reports} 
                  WHERE user_id={users}.id)
    """.format(reports=Report._meta.db_table, users=User._meta.db_table)
])

事实上,这是一个非常明显和有效的解决方案,我有时想知道为什么它没有内置到 Django 作为查找。它还允许细化子查询以查找例如仅在上周有[out] 报告的用户,或[out] 未答复/未查看报告的用户。

于 2016-08-10T17:41:23.490 回答
4

除了@OrangeDog 答案。从 Django 3.0开始,您可以使用Exists子查询直接过滤查询集:

User.objects.filter(
    ~Exists(Reports.objects.filter(user__eq=OuterRef('pk'))
)
于 2020-01-31T07:33:11.083 回答
3

Alasdair 的回答很有帮助,但我不喜欢使用distinct(). 有时它可能很有用,但它通常是一种代码味道,告诉你你搞砸了你的连接。

幸运的是,Django 的查询允许您过滤子查询。使用 Django 3.0,您还可以使用exists 子句

以下是从您的问题运行查询的几种方法:

# Tested with Django 3.0 and Python 3.6
import logging
import sys

import django
from django.apps import apps
from django.apps.config import AppConfig
from django.conf import settings
from django.db import connections, models, DEFAULT_DB_ALIAS
from django.db.models import Exists, OuterRef
from django.db.models.base import ModelBase

NAME = 'udjango'
DB_FILE = NAME + '.db'


def main():
    setup()

    class User(models.Model):
        email = models.EmailField()

        def __repr__(self):
            return 'User({!r})'.format(self.email)

    class Report(models.Model):
        user = models.ForeignKey(User, on_delete=models.CASCADE)

    syncdb(User)
    syncdb(Report)

    anne = User.objects.create(email='anne@example.com')
    User.objects.create(email='adam@example.com')
    alice = User.objects.create(email='alice@example.com')
    User.objects.create(email='bob@example.com')

    Report.objects.create(user=anne)
    Report.objects.create(user=alice)
    Report.objects.create(user=alice)

    logging.info('users without reports')
    logging.info(User.objects.filter(report__isnull=True, email__startswith='a'))

    logging.info('users with reports (allows duplicates)')
    logging.info(User.objects.filter(report__isnull=False, email__startswith='a'))

    logging.info('users with reports (no duplicates)')
    logging.info(User.objects.exclude(report__isnull=True).filter(email__startswith='a'))

    logging.info('users with reports (no duplicates, simpler SQL)')
    report_user_ids = Report.objects.values('user_id')
    logging.info(User.objects.filter(id__in=report_user_ids, email__startswith='a'))

    logging.info('users with reports (EXISTS clause, Django 3.0)')
    logging.info(User.objects.filter(
        Exists(Report.objects.filter(user_id=OuterRef('id'))),
        email__startswith='a'))

    logging.info('Done.')


def setup():
    with open(DB_FILE, 'w'):
        pass  # wipe the database
    settings.configure(
        DEBUG=True,
        DATABASES={
            DEFAULT_DB_ALIAS: {
                'ENGINE': 'django.db.backends.sqlite3',
                'NAME': DB_FILE}},
        LOGGING={'version': 1,
                 'disable_existing_loggers': False,
                 'formatters': {
                    'debug': {
                        'format': '%(asctime)s[%(levelname)s]'
                                  '%(name)s.%(funcName)s(): %(message)s',
                        'datefmt': '%Y-%m-%d %H:%M:%S'}},
                 'handlers': {
                    'console': {
                        'level': 'DEBUG',
                        'class': 'logging.StreamHandler',
                        'formatter': 'debug'}},
                 'root': {
                    'handlers': ['console'],
                    'level': 'INFO'},
                 'loggers': {
                    "django.db": {"level": "DEBUG"}}})
    app_config = AppConfig(NAME, sys.modules['__main__'])
    apps.populate([app_config])
    django.setup()
    original_new_func = ModelBase.__new__

    @staticmethod
    def patched_new(cls, name, bases, attrs):
        if 'Meta' not in attrs:
            class Meta:
                app_label = NAME
            attrs['Meta'] = Meta
        return original_new_func(cls, name, bases, attrs)
    ModelBase.__new__ = patched_new


def syncdb(model):
    """ Standard syncdb expects models to be in reliable locations.

    Based on https://github.com/django/django/blob/1.9.3
    /django/core/management/commands/migrate.py#L285
    """
    connection = connections[DEFAULT_DB_ALIAS]
    with connection.schema_editor() as editor:
        editor.create_model(model)


main()

如果将其放入 Python 文件并运行它,您应该会看到如下内容:

2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) PRAGMA foreign_keys = OFF; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) PRAGMA foreign_keys; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) BEGIN; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.schema.execute(): CREATE TABLE "udjango_user" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "email" varchar(254) NOT NULL); (params None)
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) CREATE TABLE "udjango_user" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "email" varchar(254) NOT NULL); args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) PRAGMA foreign_key_check; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) PRAGMA foreign_keys = ON; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) PRAGMA foreign_keys = OFF; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) PRAGMA foreign_keys; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) BEGIN; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.schema.execute(): CREATE TABLE "udjango_report" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "user_id" integer NOT NULL REFERENCES "udjango_user" ("id") DEFERRABLE INITIALLY DEFERRED); (params None)
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) CREATE TABLE "udjango_report" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "user_id" integer NOT NULL REFERENCES "udjango_user" ("id") DEFERRABLE INITIALLY DEFERRED); args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) PRAGMA foreign_key_check; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.schema.execute(): CREATE INDEX "udjango_report_user_id_60bc619c" ON "udjango_report" ("user_id"); (params ())
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) CREATE INDEX "udjango_report_user_id_60bc619c" ON "udjango_report" ("user_id"); args=()
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) PRAGMA foreign_keys = ON; args=None
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.017) INSERT INTO "udjango_user" ("email") VALUES ('anne@example.com'); args=['anne@example.com']
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.023) INSERT INTO "udjango_user" ("email") VALUES ('adam@example.com'); args=['adam@example.com']
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.022) INSERT INTO "udjango_user" ("email") VALUES ('alice@example.com'); args=['alice@example.com']
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.022) INSERT INTO "udjango_user" ("email") VALUES ('bob@example.com'); args=['bob@example.com']
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.029) INSERT INTO "udjango_report" ("user_id") VALUES (1); args=[1]
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.033) INSERT INTO "udjango_report" ("user_id") VALUES (3); args=[3]
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.033) INSERT INTO "udjango_report" ("user_id") VALUES (3); args=[3]
2019-12-06 11:45:17[INFO]root.main(): users without reports
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) SELECT "udjango_user"."id", "udjango_user"."email" FROM "udjango_user" LEFT OUTER JOIN "udjango_report" ON ("udjango_user"."id" = "udjango_report"."user_id") WHERE ("udjango_user"."email" LIKE 'a%' ESCAPE '\' AND "udjango_report"."id" IS NULL) LIMIT 21; args=('a%',)
2019-12-06 11:45:17[INFO]root.main(): <QuerySet [User('adam@example.com')]>
2019-12-06 11:45:17[INFO]root.main(): users with reports (allows duplicates)
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) SELECT "udjango_user"."id", "udjango_user"."email" FROM "udjango_user" INNER JOIN "udjango_report" ON ("udjango_user"."id" = "udjango_report"."user_id") WHERE ("udjango_user"."email" LIKE 'a%' ESCAPE '\' AND "udjango_report"."id" IS NOT NULL) LIMIT 21; args=('a%',)
2019-12-06 11:45:17[INFO]root.main(): <QuerySet [User('anne@example.com'), User('alice@example.com'), User('alice@example.com')]>
2019-12-06 11:45:17[INFO]root.main(): users with reports (no duplicates)
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) SELECT "udjango_user"."id", "udjango_user"."email" FROM "udjango_user" WHERE (NOT ("udjango_user"."id" IN (SELECT U0."id" FROM "udjango_user" U0 LEFT OUTER JOIN "udjango_report" U1 ON (U0."id" = U1."user_id") WHERE U1."id" IS NULL)) AND "udjango_user"."email" LIKE 'a%' ESCAPE '\') LIMIT 21; args=('a%',)
2019-12-06 11:45:17[INFO]root.main(): <QuerySet [User('anne@example.com'), User('alice@example.com')]>
2019-12-06 11:45:17[INFO]root.main(): users with reports (no duplicates, simpler SQL)
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) SELECT "udjango_user"."id", "udjango_user"."email" FROM "udjango_user" WHERE ("udjango_user"."email" LIKE 'a%' ESCAPE '\' AND "udjango_user"."id" IN (SELECT U0."user_id" FROM "udjango_report" U0)) LIMIT 21; args=('a%',)
2019-12-06 11:45:17[INFO]root.main(): <QuerySet [User('anne@example.com'), User('alice@example.com')]>
2019-12-06 11:45:17[INFO]root.main(): users with reports (EXISTS clause, Django 3.0)
2019-12-06 11:45:17[DEBUG]django.db.backends.debug_sql(): (0.000) SELECT "udjango_user"."id", "udjango_user"."email" FROM "udjango_user" WHERE (EXISTS(SELECT U0."id", U0."user_id" FROM "udjango_report" U0 WHERE U0."user_id" = "udjango_user"."id") AND "udjango_user"."email" LIKE 'a%' ESCAPE '\') LIMIT 21; args=('a%',)
2019-12-06 11:45:17[INFO]root.main(): <QuerySet [User('anne@example.com'), User('alice@example.com')]>
2019-12-06 11:45:17[INFO]root.main(): Done.

您可以看到最终查询使用了所有内部联接。

于 2017-01-23T23:05:25.110 回答
1

要过滤没有与之关联的报告的用户,请尝试以下操作:

users = User.objects.exclude(id__in=[elem.user.id for elem in Report.objects.all()])

于 2013-02-12T11:55:21.273 回答
1

查找存在连接行的行的最佳选择:

Report.objects.filter(user__isnull=False).distinct()

这使用一个INNER JOIN(然后冗余检查User.id不为空)。

查找没有连接行的行的最佳选择:

Report.objects.filter(user__isnull=True)

这使得LEFT OUTER JOIN, then checksUser.id不为空。

基于连接的查询将比子查询更快,因此这比新可用的选项(例如 Django >= 3)更快,用于查找没有连接行的行:

Report.objects.filter(~Exists(User.objects.filter(report=OuterRef('pk'))))

这会创建一个WHERE NOT EXISTS (SELECT .. FROM User..)因此涉及一个可能很大的中间结果集(感谢@Tomasz Gandor)。

这对于 Django <3,filter()不能传递子查询,也使用子查询,所以速度较慢:

Report.objects.annotate(
    no_users=~Exists(User.objects.filter(report=OuterRef('pk')))
).filter(no_users=True)

这可以与子查询结合使用。在此示例中,aTextbook具有多个Versions(即,version具有textbook_id),并且 aversion具有多个Pages(即,page具有version_id)。子查询获取与页面相关联的每本教科书的最新版本:

subquery = (
    Version.objects
        .filter(
            # OuterRef joins to Version.textbook in outer query below
            textbook=OuterRef('textbook'), 
            # excludes rows with no joined Page records
            page__isnull=False)
        # ordered so [:1] below gets highest (ie, latest) version number
        .order_by('-number').distinct()
)
# Only the Version.ids of the latest versions that have pages returned by the subquery
books = Version.objects.filter(pk=Subquery(subquery.values('pk')[:1])).distinct()

要返回连接到两个表中的一个或两个表的行,请使用 Q 对象(Page并且TextMarkup两者都有可空的外键连接到File):

from django.db.models import Q

File.objects.filter(Q(page__isnull=False) | Q(textmarkup__isnull=False).distinct()
于 2020-05-02T16:38:25.080 回答