sql - SQL：选择给定用户以前看不见的记录

Question

我有一个带有类似模式的 postgres 数据库

CREATE TABLE authors (
    id integer NOT NULL
);

CREATE TABLE posts (
    id integer NOT NULL,
    author_id integer,
    text text
);

CREATE TABLE comments (
    id integer NOT NULL,
    post_id integer,
    ordinal integer DEFAULT 0,
    author_id integer
);

给定一个特定的author_id，我希望能够选择一批 20 个帖子：

不包括有该作者评论的帖子。
不是那个作者的帖子。
是否包括该帖子的 10 条最新评论。

我认为第 1 点正在扼杀我的查询时间。到目前为止，我一直在使用内部查询来解决这一点，比如

SELECT * from posts
WHERE posts.id NOT IN (
   SELECT posts.id FROM posts JOIN comments ON posts.id = comments.post_id)

随着我的数据库的增长，这个查询变得更糟了。我不擅长 SQL；有一个更好的方法吗？我正在使用 ActiveRecord，如果这有帮助/伤害的话。

score 0 · Accepted Answer

用这个替换你的查询

SELECT * from posts p, comments c
WHERE posts.id <> c.posts.id;

希望能帮助到你

score 0 · Accepted Answer

我的经验是，“不在”比什么都更能杀死查询时间。替代方案是“不存在”，或者，如果 postgresql 支持这种语法。

where somefield in 
(select somefield
 from etc
 except
 select somefield
 from etc)

有时会使用减号来代替除外。

score 0 · Accepted Answer

调试呈现的查询

您提出的查询不必要地低效。首先，您可以通过在子查询中去掉多余的 JOIN 来简化：

SELECT *
FROM   posts
WHERE  posts.id NOT IN (SELECT post_id FROM comments)

这可以重写为LEFT JOIN / IS NULL或使用NOT EXISTS半反连接，我希望它表现最好：

SELECT *
FROM   posts p
WHERE  NOT EXISTS (SELECT 1 FROM comments c WHERE c.post_id = p.id)

完整查询

您的第 3 点不清楚：

是否包括该帖子的 10 条最新评论。

忽略那个，查询可能是：

SELECT *
FROM   posts p
WHERE  p.author_id <> $author_id -- "not a post by that author"
AND    NOT EXISTS (
   SELECT 1
   FROM   comments c
   WHERE  c.author_id = $author_id
   AND    c.post_id = p.id) -- exclude "post that has a comment by that author"
--  ORDER  BY ??undefined??, maybe id DESC
LIMIT  20

posts.author_id应该定义NOT NULL或者你必须使用：

p.author_id IS DISTINCT FROM $author_id

有很多行，索引是性能的关键。我希望posts.id成为主键，因此它会自动编制索引。如果您还没有，请添加此多列索引：

CREATE INDEX comments_pa_idx ON comments (post_id, author_id);

sql - SQL：选择给定用户以前看不见的记录

3 回答 3

调试呈现的查询

完整查询

Related

Reference