0

在尝试从多个连接表中提取大量列(~15-20)时,我将两个视图放在一起,它们可以提取必要的信息。但是,在我的本地数据库中(只有约 1kposts行),加入这些视图效果很好;当我在我们的生产数据库(约 30k 行)上创建相同的视图posts并尝试加入该视图时,我意识到该解决方案不会扩展到测试数据集之外。

我试图将这 2 个视图(类别数据——比如——categories.title和创作者的数据——比如.users.display_namepost_data

我已经将一个示例DBFiddle与一些测试数据放在一起来解释表结构。实际数据有更多列,但这代表了构建查询所需的连接。

table : posts
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| id  | parent_id | created_by |                 message                  |              attachments               |
+-----+-----------+------------+------------------------------------------+----------------------------------------+
|  8  | NULL      |          8 | laptop for sale                          | [{"media_id": 1380}]                   |
|  9  | NULL      |          4 | NEW lamp shade up for grabs              | [{"media_id": 1442}, {"link_id": 103}] |
|  10 | 1         |          7 | Oooh I could be interested               |                                        |
|  11 | 1         |          7 | DMing you now! I've been looking for one |                                        |
+-----+-----------+------------+------------------------------------------+----------------------------------------+

table : users
+----+------------------+---------------------------+
| id |   display_name   |        created_at         |
+----+------------------+---------------------------+
|  1 | John Appleseed   | 2018-02-20T00:00:00+00:00 |
|  2 | Massimo Jenkins  | 2018-05-14T00:00:00+00:00 |
|  3 | Johanna Marionna | 2018-06-05T00:00:00+00:00 |
|  4 | Jackson Creek    | 2018-11-15T00:00:00+00:00 |
|  5 | Joe Schmoe       | 2019-01-09T00:00:00+00:00 |
|  6 | John Johnson     | 2019-02-14T00:00:00+00:00 |
|  7 | Donna Madison    | 2019-05-14T00:00:00+00:00 |
|  8 | Jenna Kaplan     | 2019-06-23T00:00:00+00:00 |
+----+------------------+---------------------------+

table : categories
+----+------------+------------+-------------------------------------------------------+
| id | created_by |   title    |                      description                      |
+----+------------+------------+-------------------------------------------------------+
|  1 |          2 | Technology | Anything tech; Consumer, business or education tools! |
|  2 |          2 | Home Goods | Anything for the home                                 |
+----+------------+------------+-------------------------------------------------------+

table : categories_posts
+---------+-------------+
| post_id | category_id |
+---------+-------------+
|       8 |           1 |
|       9 |           1 |
|      10 |           1 |
|      11 |           1 |
+---------+-------------+

table : users_categories
+---------+-------------+
| user_id | category_id |
+---------+-------------+
|       1 |           1 |
|       2 |           1 |
|       3 |           1 |
|       4 |           1 |
+---------+-------------+

table : posts_removed
+---------+----------------------+------------+
| post_id |      removed_at      | removed_by |
+---------+----------------------+------------+
|      10 |  2019-01-22 09:08:14 |          7 |
+---------+----------------------+------------+

在下面的查询中,符合条件的职位是在 base 中确定的SELECT;然后,将 post_data CTE 加入结果集(限制为 25 行)并返回 CTE 中的所有列。

WITH post_data AS (
    SELECT posts.id,
           posts.parent_id,
           posts.created_by,
           posts.attachments,
           categories_posts.category_id,
           categories.title,
           categories.created_by AS category_created_by,
           creator.display_name AS creator_display_name,
           creator.created_at AS creator_created_at
           /* ... And a whole bunch of other fields from posts, categories_posts, users */
    FROM posts
    LEFT OUTER JOIN categories_posts
        ON categories_posts.post_id = posts.id
    LEFT OUTER JOIN categories
        ON categories.id = categories_posts.category_id
    LEFT OUTER JOIN users creator
        ON creator.id = posts.created_by
    /* ... And a whole bunch of other joins to facilitate the selected fields */
)
SELECT post_data.*
FROM posts
        /* Set up the criteria for the posts selected before getting their data from the CTE */
    LEFT OUTER JOIN posts_removed removed ON removed.post_id = posts.id
    LEFT OUTER JOIN users user_me ON user_me.id = "1"
    LEFT OUTER JOIN users_followed ON users_followed.user_id = posts.created_by
        AND users_followed.followed_by = user_me.id
    LEFT OUTER JOIN categories_posts ON categories_posts.post_id = posts.id
    LEFT OUTER JOIN users_categories ON users_categories.category_id = categories_posts.category_id
    LEFT OUTER JOIN posts_removed pp_removed ON pp_removed.post_id = posts.parent_id
    /* Join our post_data on the post's ID */
    JOIN post_data ON post_data.id = posts.id
WHERE
(
    (
        users_categories.user_id = user_me.id AND users_categories.left_at IS NULL
    ) OR categories_posts.category_id IS NULL
) AND (
    posts.created_by = user_me.id
    OR users_followed.followed_by = user_me.id
    OR categories_posts.category_id IS NOT NULL
) AND removed.removed_at IS NULL
    AND pp_removed.removed_at IS NULL
    AND (post_data.id = posts.id OR post_data.id = posts.parent_id)
ORDER BY posts.id DESC
LIMIT 25

理论上,我认为这可以通过根据基本选择标准选择行,然后根据 Post ID 对 CTE 进行索引扫描来实现;但是,查询优化器似乎选择对表进行全表扫描posts

EXPLAIN SELECT了我这个信息:

+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| id | select_type |         table          |  type  |         possible_keys         |     key     | key_len |                     ref                     |  rows  | filtered |                       extra                        |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
|  1 | PRIMARY     | posts                  | ALL    | PRIMARY,parent_id,created_by  |             |         |                                             |  33870 |      100 | Using temporary; Using filesort                    |
|  1 | PRIMARY     | removed                | eq_ref | PRIMARY                       | PRIMARY     |       8 | posts.id                                    |      1 |       19 | Using where                                        |
|  1 | PRIMARY     | user_me                | const  | PRIMARY                       | PRIMARY     |       8 | const                                       |      1 |      100 | Using where; Using index                           |
|  1 | PRIMARY     | categories_posts       | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.id                                |      1 |      100 |                                                    |
|  1 | PRIMARY     | categories             | eq_ref | PRIMARY                       | PRIMARY     |       8 | categories_posts.category_id                |      1 |      100 | Using index                                        |
|  1 | PRIMARY     | users_categories       | eq_ref | user_id_2,user_id,category_id | user_id_2   |      16 | user_me.id,api.categories_posts.category_id |      1 |      100 | Using where                                        |
|  1 | PRIMARY     | users_followed         | eq_ref | user_id,followed_by           | user_id     |      16 | posts.created_by,api.user_me.id             |      1 |      100 | Using where; Using index                           |
|  1 | PRIMARY     | pp_removed             | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.parent_id                         |      1 |       19 | Using where                                        |
|  1 | PRIMARY     | <derived2>             | ALL    |                               |             |         |                                             | 493911 |       19 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DERIVED     | posts                  | ALL    |                               |             |         |                                             |  33870 |      100 | Using temporary                                    |
|  2 | DERIVED     | categories_posts       | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.id                                |      1 |      100 |                                                    |
|  2 | DERIVED     | categories             | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.categories_posts.category_id            |      1 |      100 |                                                    |
|  2 | DERIVED     | posts_votes            | ref    | post_id                       | post_id     |       8 | api.posts.id                                |      1 |      100 | Using index                                        |
|  2 | DERIVED     | pp                     | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.parent_id                         |      1 |      100 |                                                    |
|  2 | DERIVED     | pp_removed             | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.pp.id                                   |      1 |      100 | Using index                                        |
|  2 | DERIVED     | removed                | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.id                                |      1 |      100 | Using index                                        |
|  2 | DERIVED     | creator                | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.created_by                        |      1 |      100 |                                                    |
|  2 | DERIVED     | usernames              | ref    | user_id                       | user_id     |       8 | api.creator.id                              |      1 |      100 |                                                    |
|  2 | DERIVED     | verifications          | ALL    |                               |             |         |                                             |      4 |      100 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DERIVED     | categories_identifiers | ref    | category_id                   | category_id |       8 | api.categories.id                           |      1 |      100 |                                                    |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+

除此之外,我尝试重构查询以尝试强制在posts表中使用键,例如FORCE INDEX(PRIMARY)在选择中使用,并将 CTE 移动为基本查询并添加过滤器WHERE id IN ({the original base query}),但优化器似乎仍在进行全表扫描。

如果有助于解码查询计划中发生的事情:

  • 在撰写本文时,有33,387 posts行,但查询计划显示
  • 查询计划显示返回33,870行的全表扫描
  • 查询计划还将派生表 ( <derived2>) 显示为具有493,911

我的核心问题是:

  1. 当我说子查询应该只对来自基本选择查询的每个结果行执行一次时,我是否正确?如果是这样,那么 CTE 也应该使用 JOINposts.id并可能使用表索引?

  2. 为什么查询计划显示它只有33,387行时选择了 33,870 行?493,911 行是从哪里来的?

  3. 在这种情况下如何防止全表扫描?

4

1 回答 1

0

试试这个...做LIMIT 25之前JOINingWITH

SELECT * FROM
    ( SELECT ... FROM posts
               JOIN categories_posts ...
        ORDER BY posts.id DESC
        LIMIT 25 ) AS x
    JOIN post_data
       ON post_data.id IN (x.id, x.parent_id)
    ORDER BY posts.id DESC
于 2020-01-27T01:51:11.533 回答