我有一个相对较大的 4 深度关系数据设置,如下所示:
ClientApplication has_many => ClientApplicationVersions
ClientApplicationVersions has_many => CloudLogs
CloudLogs has_many => Logs
client_applications
: (可能有 1,000 条记录)
- ...
- account_id
- public_key
-deleted_at
client_application_versions
: (可能有 10,000 条记录)
- ...
- client_application_id
- public_key
-deleted_at
cloud_logs
: (可能有 1,000,000 条记录)
- ...
- client_application_version_id
- public_key
-deleted_at
logs
: (可能有 1,000,000,000 条记录)
- ...
- cloud_log_id
- public_key
- time_stamp
-deleted_at
我仍在开发中,所以结构和设置不是一成不变的,但我希望它设置好。使用 Rails 3.2.11 和 InnoDB MySQL。数据库当前填充了一小部分(与最终的数据库大小相比)数据集(logs
只有 500,000 行)我有 4 个作用域查询,其中 3 个有问题,用于检索日志。
- 抓取第一页日志,按时间戳排序,限制为
account_id
,client_application.public_key
,client_application_version.public_key
(超过 100 秒) - 抓取日志的第一页,按时间戳排序,限制为
account_id
,client_application.public_key
(超过 100 秒) - 抓取日志的第一页,按时间戳排序,限制为
account_id
(超过 100 秒) - 获取日志的第一页,按时间戳排序(约 2 秒)
我正在使用 rails 范围来帮助进行这些调用:
scope :account_id, proc {|account_id| joins(:client_application).where("client_applications.account_id = ?", account_id) }
scope :client_application_key, proc {|client_application_key| joins(:client_application).where("client_applications.public_key = ?", client_application_key) }
scope :client_application_version_key, proc {|client_application_version_key| joins(:client_application_version).where("client_application_versions.public_key = ?", client_application_version_key) }
default_scope order('logs.timestamp DESC')
我在每个表上都有索引public_key
。我在logs
表上有几个索引,包括优化器喜欢使用的索引 ( index_logs_on_cloud_log_id
),但是查询仍然需要 eons 才能运行。
以下是我如何调用该方法rails console
:
Log.account_id(1).client_application_key('p0kZudG0').client_application_version_key('0HgoJRyE').page(1)
...这是rails将其变成的内容:
SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `cloud_logs`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` ON `client_application_versions`.`id` = `cloud_logs`.`client_application_version_id` INNER JOIN `client_applications` ON `client_applications`.`id` = `client_application_versions`.`client_application_id` INNER JOIN `cloud_logs` `cloud_logs_logs_join` ON `cloud_logs_logs_join`.`id` = `logs`.`cloud_log_id` INNER JOIN `client_application_versions` `client_application_versions_logs` ON `client_application_versions_logs`.`id` = `cloud_logs_logs_join`.`client_application_version_id` WHERE (logs.deleted_at IS NULL) AND (client_applications.account_id = 1) AND (client_applications.public_key = 'p0kZudG0') AND (client_application_versions.public_key = '0HgoJRyE') ORDER BY logs.timestamp DESC LIMIT 100 OFFSET 0
...这是该查询的 EXPLAIN 语句。
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | client_application_versions | ref | PRIMARY,index_client_application_versions_on_client_application_id,index_client_application_versions_on_public_key | index_client_application_versions_on_public_key | 768 | const | 1 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | client_applications | eq_ref | PRIMARY,index_client_applications_on_account_id,index_client_applications_on_public_key | PRIMARY | 4 | cloudlog_production.client_application_versions.client_application_id | 1 | Using where |
| 1 | SIMPLE | cloud_logs | ref | PRIMARY,index_cloud_logs_on_client_application_version_id | index_cloud_logs_on_client_application_version_id | 5 | cloudlog_production.client_application_versions.id | 481 | Using where; Using index |
| 1 | SIMPLE | cloud_logs_logs_join | eq_ref | PRIMARY,index_cloud_logs_on_client_application_version_id | PRIMARY | 4 | cloudlog_production.cloud_logs.id | 1 | |
| 1 | SIMPLE | client_application_versions_logs | eq_ref | PRIMARY | PRIMARY | 4 | cloudlog_production.cloud_logs_logs_join.client_application_version_id | 1 | Using index |
| 1 | SIMPLE | logs | ref | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id | 5 | cloudlog_production.cloud_logs.id | 4 | Using where |
+----+-------------+----------------------------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+------------------------------------------------------------------------+------+----------------------------------------------+
这个问题有 3 个部分:
- 我可以使用额外的索引来优化我的数据库,以帮助这些类型的连接相关的排序查询变得更高效吗?
- 我可以优化 rails 代码以帮助这种类型的
find
运行以更高效的方式运行吗? - 我只是在接近这个范围内找到大型数据集的错误方法吗?
更新 1/24/12
正如 Geoff 和 J_MCCaffrey 在答案中所建议的,我已将查询分成 3 个不同的部分以尝试隔离问题。正如所料,这是处理最大表的问题。MYSQL 优化器通过使用不同的索引以不同方式处理此问题,但延迟仍然存在。这是这种方法的解释。
ClientApplication.find_by_account_id_and_public_key(1, 'p0kZudG0').versions.select{|cav| cav.public_key = '0HgoJRyE'}.first.logs.page(2)
ClientApplication Load (165.9ms) SELECT `client_applications`.* FROM `client_applications` WHERE `client_applications`.`account_id` = 1 AND `client_applications`.`public_key` = 'p0kZudG0' AND (client_applications.deleted_at IS NULL) ORDER BY client_applications.id LIMIT 1
ClientApplicationVersion Load (105.1ms) SELECT `client_application_versions`.* FROM `client_application_versions` WHERE `client_application_versions`.`client_application_id` = 3 AND (client_application_versions.deleted_at IS NULL) ORDER BY client_application_versions.created_at DESC, client_application_versions.id DESC
Log Load (57295.0ms) SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN (214.5ms) EXPLAIN SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
EXPLAIN for: SELECT `logs`.* FROM `logs` INNER JOIN `cloud_logs` ON `logs`.`cloud_log_id` = `cloud_logs`.`id` WHERE `cloud_logs`.`client_application_version_id` = 49 AND (logs.deleted_at IS NULL) AND (cloud_logs.deleted_at IS NULL) ORDER BY logs.timestamp DESC, cloud_logs.received_at DESC LIMIT 100 OFFSET 100
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | cloud_logs | index_merge | PRIMARY,index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at | 5,9 | NULL | 1874 | Using intersect(index_cloud_logs_on_client_application_version_id,index_cloud_logs_on_deleted_at); Using where; Using temporary; Using filesort |
| 1 | SIMPLE | logs | ref | index_logs_on_cloud_log_id_and_deleted_at_and_timestamp,index_logs_on_cloud_log_id_and_deleted_at,index_logs_on_cloud_log_id,index_logs_on_deleted_at | index_logs_on_cloud_log_id | 5 | cloudlog_production.cloud_logs.id | 4 | Using where |
+----+-------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------+-----------------------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------------+
2012 年 1 月 25 日更新
以下是所有相关表的索引:
CLIENT_APPLICATIONS:
PRIMARY KEY (`id`),
UNIQUE KEY `index_client_applications_on_key` (`key`),
KEY `index_client_applications_on_account_id` (`account_id`),
KEY `index_client_applications_on_deleted_at` (`deleted_at`),
KEY `index_client_applications_on_public_key` (`public_key`)
CLIENT_APPLICATION_VERSIONS:
PRIMARY KEY (`id`),
KEY `index_client_application_versions_on_client_application_id` (`client_application_id`),
KEY `index_client_application_versions_on_deleted_at` (`deleted_at`),
KEY `index_client_application_versions_on_public_key` (`public_key`)
CLOUD_LOGS:
PRIMARY KEY (`id`),
KEY `index_cloud_logs_on_api_client_version_id` (`api_client_version_id`),
KEY `index_cloud_logs_on_client_application_version_id` (`client_application_version_id`),
KEY `index_cloud_logs_on_deleted_at` (`deleted_at`),
KEY `index_cloud_logs_on_device_id` (`device_id`),
KEY `index_cloud_logs_on_public_key` (`public_key`),
KEY `index_cloud_logs_on_received_at` (`received_at`)
LOGS:
PRIMARY KEY (`id`),
KEY `index_logs_on_class_name` (`class_name`),
KEY `index_logs_on_cloud_log_id_and_deleted_at_and_timestamp` (`cloud_log_id`,`deleted_at`,`timestamp`),
KEY `index_logs_on_cloud_log_id_and_deleted_at` (`cloud_log_id`,`deleted_at`),
KEY `index_logs_on_cloud_log_id` (`cloud_log_id`),
KEY `index_logs_on_deleted_at` (`deleted_at`),
KEY `index_logs_on_file_name` (`file_name`),
KEY `index_logs_on_method_name` (`method_name`),
KEY `index_logs_on_public_key` (`public_key`),
KEY `index_logs_on_timestamp` USING BTREE (`timestamp`)