0

我有以下 MySQL 查询:

SELECT pool.username
FROM pool
LEFT JOIN sent ON pool.username = sent.username
AND sent.campid = 'YA1LGfh9'
WHERE sent.username IS NULL
AND pool.gender = 'f'
AND (`location` = 'united states' OR `location` = 'us' OR `location` = 'usa');

问题是池表包含数百万行,并且此查询需要超过 12 分钟才能完成。我意识到在这个查询中,整个左表(池)正在被扫描。台球桌有一个自动递增的 id 行。

我想将此查询拆分为多个查询,而不是扫描整个池表,而是一次扫描 1000 行,在下一个查询中,我会从我离开的地方继续(1000-2000,2000-3000)等等关于使用 id 列进行跟踪。

如何在我的查询中指定这个?如果您知道答案,请举例说明。谢谢你。

如果有帮助,这是我的索引:

mysql> show index from main.pool;
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| pool  |          0 | PRIMARY  |            1 | id          | A         |     9275039 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | username |            1 | username    | A         |     9275039 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | source   |            1 | source      | A         |           1 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | location |            1 | location    | A         |       38168 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | pdex     |            1 | gender      | A         |           2 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | pdex     |            2 | username    | A         |     9275039 |     NULL | NULL   |      | BTREE      |         |
| pool  |          1 | pdex     |            3 | id          | A         |     9275039 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (0.00 sec)

mysql> show index from main.sent;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| sent  |          0 | PRIMARY  |            1 | primary_key | A         |         351 |     NULL | NULL   |      | BTREE      |         |
| sent  |          1 | username |            1 | username    | A         |         175 |     NULL | NULL   |      | BTREE      |         |
| sent  |          1 | sdex     |            1 | campid      | A         |           7 |     NULL | NULL   |      | BTREE      |         |
| sent  |          1 | sdex     |            2 | username    | A         |         351 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

这是我的查询的解释:

----------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref   | rows    | Extra                                |
+----+-------------+-------+-------+---------------+------+---------+-------+---------+--------------------------------------+
|  1 | SIMPLE      | pool  | ref   | location,pdex | pdex | 5       | const | 6084332 | Using where                          |
|  1 | SIMPLE      | sent  | index | sdex          | sdex | 309     | NULL  |     351 | Using where; Using index; Not exists |
+----+-------------+-------+-------+---------------+------+---------+-------+---------+--------------------------------------+

这是台球桌的结构:

| pool  | CREATE TABLE `pool` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`username` varchar(50) CHARACTER SET utf8 NOT NULL,
`source` varchar(10) CHARACTER SET utf8 NOT NULL,
`gender` varchar(1) CHARACTER SET utf8 NOT NULL,
`location` varchar(50) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
KEY `username` (`username`),
KEY `source` (`source`),
KEY `location` (`location`),
KEY `pdex` (`gender`,`username`,`id`)
) ENGINE=MyISAM AUTO_INCREMENT=9327026 DEFAULT CHARSET=latin1 |

这是发送表的结构:

| sent  | CREATE TABLE `sent` (
`primary_key` int(50) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`from` varchar(50) NOT NULL,
`campid` varchar(255) NOT NULL,
`timestamp` int(20) NOT NULL,
PRIMARY KEY (`primary_key`),
KEY `username` (`username`),
KEY `sdex` (`campid`,`username`)
) ENGINE=MyISAM AUTO_INCREMENT=352 DEFAULT CHARSET=latin1 |

这会产生语法错误,但开头的这个 WHERE 子句是我之后的内容:

SELECT pool.username
FROM pool
WHERE id < 1000
LEFT JOIN sent ON pool.username = sent.username
AND sent.campid = 'YA1LGfh9'
WHERE sent.username IS NULL
AND pool.gender = 'f'
AND (location = 'united states' OR location = 'us' OR location = 'usa');
4

2 回答 2

0

看起来它使用 pool.location 可以尝试添加性别索引,但可能没有太大帮助。将位置合理化为数据中的国家/地区代码,并编制索引可能会有用。

但是要添加的第一个索引对我来说看起来很野蛮,这可能会严重减少它必须测试的记录数量。

于 2012-06-12T10:34:08.960 回答
0

拆分查询听起来不像是正确的方法。

更好的方法是从现有查询中获取一些记录,发送消息,然后继续获取。


您的查询可以从另一个复合索引中受益

pool( location, gender, username )

这应该允许从sdex您的新索引运行您的完整查询。


如果您真的想拆分查询,一种简单的方法可能是

SELECT MIN(id), MAX(id) FROM pool

然后以 的步骤从 min 循环到 max 1000,并添加id >= r AND id < r+1000到您的查询中。

如果您有间隙,这可能会返回0行,但它永远不会一次返回超过 1000 行。pool包含 ( idlocationgender可能)的不同复合索引username可能有助于此查询。

于 2012-06-12T10:41:39.877 回答