2

我有一个博客,我在其中通过将条目插入到views带有访问者 IP、帖子 ID(我在这些字段上获得主键)和时间戳的表 ( ) 中来跟踪谁查看了哪些帖子以及何时查看。

然后使用此表显示我的每个类别的前 5 个帖子(其中有 4 个)在最后一天/一周/一个月/一年和所有时间。因此,总共执行了 20 个查询,每个查询都在 0.2 到 0.7 秒之间……我的页面加载时间超过 7 秒,这太糟糕了。

这里有一些关于我的数据库结构的有用信息:

+---------------------+        +----------------------+
|   posts (82 rows)   |        |   views (50k rows)   |
+=====================+        +======================+
|    id (primary)     |        |     ip (primary)     |
+---------------------+        +----------------------+
|        type         |        | article_id (primary) |
+---------------------+        +----------------------+
|     thumbnail       |        |     date (index)     |
+---------------------+        +----------------------+
|    title (index)    |       
+---------------------+
|         url         |
+---------------------+
| description (index) |
+---------------------+
|       content       | 
+---------------------+
|        date         |
+---------------------+
|       lastmod       |
+---------------------+
|       sources       |
+---------------------+
|        tags         |
+---------------------+
|      published      |
+---------------------+
|         ...         |
+---------------------+

...代表我的帖子英文版的附加字段(、url_entitle_endescription_entags_encontent_en

这是我的一个巨大的查询(它们都基本相同):

SELECT p.title, p.id, p.url, tmp.cnt AS views
FROM posts AS p 
LEFT JOIN (SELECT COUNT(*) AS cnt, article_id -- 0.34s
           FROM views
           WHERE article_id IN (SELECT id
                                FROM posts
                                WHERE id <> 12 AND type = 'Tutoriel') AND 
                 date BETWEEN 01-01-2013 AND NOW() -- the 01-01-2013 is normally a variable but for testing purposes I've replaced it with a fixed date here
           GROUP BY article_id
           ORDER BY cnt DESC LIMIT 5) AS tmp 
       ON p.id = tmp.article_id
WHERE p.id IN (SELECT article_id
               FROM (SELECT COUNT(*) AS cnt, article_id -- 0.34s
                     FROM views
                     WHERE article_id IN (SELECT id
                                          FROM posts
                                          WHERE id <> 12 AND type = 'Tutoriel')
                       AND date BETWEEN 01-01-2013 AND NOW()
                     GROUP BY article_id
                     ORDER BY cnt DESC LIMIT 5) AS tmp2 
              )
ORDER BY views DESC

我发现该BETWEEN子句是大部分时间所花费的,因为我对所有帖子的所有时间统计信息都进行了相同的查询(因此,不依赖于类别或日期),并且只需要 0.03 秒即可执行。

我已经以所有可能的方式查看了这个查询,但找不到更简单、更优化的方式来编写它……然而,我觉得必须有一种方式。也许我只是在这里遗漏了一些明显的东西。

困扰我的一件事是我的重复子查询。我没有找到任何其他方法来获取我的帖子数据和相关视图的数量。

我正在考虑的是当用户单击该时期的选项卡(这是一个选项卡式视图)时,可能会为每个时期执行 AJAX 请求。但是,这并不能真正解决问题,只是感觉像是一种肮脏的解决方法。

我也许可以posts通过以下方式之一对我的表进行分区:

  • 一张是法语版的,另一张是英文版的
  • 一张表用于常用字段(title, description, url),另一张用于其余字段
  • 以上结合

如果我没记错的话,这可以加快速度。

有人可以给我一些建议吗?顺便说一句,谢谢你一直陪我到这里 :)

4

2 回答 2

1

旧版本的 MySQL 特别不擅长in使用子查询进行优化。尝试join改用:

SELECT p.title, p.id, p.url, tmp.cnt AS views
FROM posts AS p 
LEFT JOIN (SELECT COUNT(*) AS cnt, article_id -- 0.34s
           FROM views
           WHERE article_id IN (SELECT id
                                FROM posts
                                WHERE id <> 12 AND type = 'Tutoriel') AND 
                 date BETWEEN 01-01-2013 AND NOW() -- the 01-01-2013 is normally a variable but for testing purposes I've replaced it with a fixed date here
           GROUP BY article_id
           ORDER BY cnt DESC LIMIT 5) AS tmp 
       ON p.id = tmp.article_id join
          (SELECT COUNT(*) AS cnt, article_id -- 0.34s
           FROM views v join
                (SELECT id
                 FROM posts p
                 WHERE p.id <> 12 AND p.type = 'Tutoriel'
                ) p
                on v.article_id = p.id
            WHERE v.date BETWEEN 01-01-2013 AND NOW()
            GROUP BY v.article_id
            ORDER BY cnt DESC
            LIMIT 5
           ) a
       on p.id = a.article_id
ORDER BY views DESC

编辑:

如果我正确理解了查询,您可以将其更改left outer join为 ajoin并完全消除该where子句:

SELECT p.title, p.id, p.url, tmp.cnt AS views
FROM posts Ap JOIN
     (SELECT COUNT(*) AS cnt, article_id -- 0.34s
      FROM views
      WHERE article_id IN (SELECT id
                           FROM posts
                           WHERE id <> 12 AND type = 'Tutoriel') AND 
            date BETWEEN 01-01-2013 AND NOW() -- the 01-01-2013 is normally a variable but for testing purposes I've replaced it with a fixed date here
     GROUP BY article_id
     ORDER BY cnt DESC
     LIMIT 5
    ) tmp 
    ON p.id = tmp.article_id;

然后in将子查询中的更改为连接:

SELECT p.title, p.id, p.url, tmp.cnt AS views
FROM posts Ap JOIN
     (SELECT COUNT(*) AS cnt, article_id -- 0.34s
      FROM views v join
           (SELECT distinct p.id  -- distinct may not be necessary
            FROM posts p
            WHERE p.id <> 12 AND p.type = 'Tutoriel'
           ) p
           on v.rticle_id = p.id
      WHERE date BETWEEN 01-01-2013 AND NOW() -- the 01-01-2013 is normally a variable but for testing purposes I've replaced it with a fixed date here
     GROUP BY article_id
     ORDER BY cnt DESC
     LIMIT 5
    ) tmp 
    ON p.id = tmp.article_id;
于 2013-07-27T15:47:18.877 回答
1

不确定它会有所帮助,但如果BETWEEN需要很多时间,也许会把它变成另一种情况?

date BETWEEN 01-01-2013 AND NOW()

date > 01-01-2013

所以它不必比较两个日期,它总是在 01-01-2013 和 NOW 之间

于 2013-07-27T15:33:18.623 回答