MySQL版本:

1
2
3
4
5
6
SELECT VERSION();
+------------+
| VERSION() |
+------------+
| 5.7.21-log |
+------------+

建表语句:

1
2
3
4
5
6
7
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
`agent_id` int(11) DEFAULT NULL COMMENT '代理商ID',
`name` varchar(16) DEFAULT NULL COMMENT '姓名',
`score` int(11) DEFAULT '0' COMMENT '积分',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='用户表';
1
2
3
4
5
6
7
CREATE TABLE `agent` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
`name` varchar(16) DEFAULT NULL COMMENT '名称',
`level` int(11) DEFAULT NULL COMMENT '级别',
PRIMARY KEY (`id`),
KEY `idx_level` (`level`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='代理商';

为用户表建立索引:

1
CREATE INDEX idx_agentid ON user(agent_id);

为用户表建立索引:

1
CREATE INDEX idx_score_name ON user(score, name);

为代理商建立索引:

1
CREATE INDEX idx_level ON agent(level);

用户表索引列表:

1
2
3
4
5
6
7
8
9
SHOW INDEX FROM user;
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| user | 0 | PRIMARY | 1 | id | A | 19285676 | NULL | NULL | | BTREE | | |
| user | 1 | idx_agentid | 1 | agent_id | A | 5952 | NULL | NULL | YES | BTREE | | |
| user | 1 | idx_score_name | 1 | score | A | 28654 | NULL | NULL | YES | BTREE | | |
| user | 1 | idx_score_name | 2 | name | A | 19970922 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

代理商索引列表:

1
2
3
4
5
6
7
SHOW INDEX FROM agent;
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| agent | 0 | PRIMARY | 1 | id | A | 5 | NULL | NULL | | BTREE | | |
| agent | 1 | idx_level | 1 | level | A | 2 | NULL | NULL | YES | BTREE | | |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

1. ORDER BY 优化

MySLQ官方文档给出了能够使用索引加速ORDER BY排序以及无法通过使用索引加速ORDER BY排序的场景案例,下面来一一列举。

1.1 使用索引加速排序

1.1.1 案例1

使用复合索引(多列索引)中的一个或多个列进行排序。要遵循最左前缀匹配原则。如果是使用复合索引中的多个列进行排序,这些列排序的顺序要么都是升序,要么都是降序,否则无法使用索引加速排序操作。

1
SELECT * FROM t1 ORDER BY key_part1, key_part2;
1
SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 DESC;
1
2
3
4
5
6
EXPLAIN SELECT * FROM user ORDER BY score, name;
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------+
| 1 | SIMPLE | user | NULL | ALL | NULL | NULL | NULL | NULL | 20434945 | 100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------+

从执行计划的结果来看,Extra出现了Using filesort,这就是通常说的Filesort排序。
Filesort排序并不代表是通过磁盘文件进行的排序,而只是说数据库服务器需要对数据进行一次额外的排序。凡是不是通过索引直接返回排序结果的排序,都叫Filesort排序。
Filesort排序算法首先是尝试将取出的数据一次性加载到内存中进行排序,排序最大能使用的内存空间大小是由系统变量sort_buffer_size决定的。如果内存空间无法容下所有取出的数据,那么排序就会分解成多个更小的排序,然后每次只排序其中一小部分的数据,并将每次排序的结果存储到磁盘临时文件中,最后再将临时文件中的数据进行一次排序和合并结果输出。
每将各个有序的小数据块合并成一个有序的结果集就会增加Sort_merge_passes的数值。因此,如果数据库服务器的Sort_merge_passes的数值过大,可以考虑适当增加sort_buffer_size的数值以加速Filesort排序的操作。但是需要注意,系统变量sort_buffer_size配置的内存空间是每个线程独占的,不宜设置过大,应该综合考虑数据库连接数量和服务器内存的总大小。

# 查看系统 sort_merge_passes 状态的值


1
2
3
4
5
6
SHOW GLOBAL STATUS LIKE 'sort_merge_passes';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Sort_merge_passes | 2461 |
+-------------------+-------+

# 查看系统 sort_buffer_size 变量的值


1
2
3
4
5
6
SHOW VARIABLES LIKE 'sort_buffer_size';
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| sort_buffer_size | 262144 |
+------------------+--------+

# 设置系统 sort_buffer_size 变量的值, 1M


1
set GLOBAL sort_buffer_size=1024*1024*1;

使用覆盖索引的方式改写上面的查询语句:

1
2
3
4
5
6
EXPLAIN SELECT id FROM user ORDER BY score, name;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | user | NULL | index | NULL | idx_score_name | 56 | NULL | 20434945 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-------------+

从执行计划的结果来看,Extra出现了Using index,已经没有Using filesort了。

1.1.2 案例2

使用复合索引(多列索引)中的一部分列做等值查询,一部列做排序操作。要遵循最左前缀匹配原则。如果是使用复合索引中的多个列进行排序,这些列排序的顺序要么都是升序,要么都是降序,否则无法使用索引加速排序操作。

1
SELECT * FROM t1 WHERE key_part1 = constant ORDER BY key_part2;
1
SELECT * FROM t1 WHERE key_part1 = constant ORDER BY key_part1 DESC, key_part2 DESC;
1
2
3
4
5
6
EXPLAIN SELECT * FROM user WHERE score = 100 ORDER BY name;
+----+-------------+-------+------------+------+----------------+----------------+---------+-------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+----------------+----------------+---------+-------+-------+----------+-----------------------+
| 1 | SIMPLE | user | NULL | ref | idx_score_name | idx_score_name | 5 | const | 38268 | 100.00 | Using index condition |
+----+-------------+-------+------------+------+----------------+----------------+---------+-------+-------+----------+-----------------------+
1.1.3 案例3

使用复合索引(多列索引)中的一部分列做比较值查询,一部列做排序操作。要遵循最左前缀匹配原则。如果是使用复合索引中的多个列进行排序,这些列排序的顺序要么都是升序,要么都是降序,否则无法使用索引加速排序操作。

1
SELECT * FROM t1 WHERE key_part1 > constant ORDER BY key_part1 ASC;
1
SELECT * FROM t1 WHERE key_part1 < constant ORDER BY key_part1 DESC;
1
2
3
4
5
6
EXPLAIN SELECT * FROM user WHERE score < 10 ORDER BY score;
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+--------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+--------+----------+-----------------------+
| 1 | SIMPLE | user | NULL | range | idx_score_name | idx_score_name | 5 | NULL | 417240 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+--------+----------+-----------------------+

比较值的场景下,优化器不一定会选择采用索引的方式,主要还得看表数据量的大小以及MySQL估算查询要扫描的行数来决定是否选择走索引。

1.2 无法使用索引加速排序

1.2.1 案例1

使用多个不同的单列索引进行排序。

1
SELECT * FROM t1 ORDER BY key1, key2;
1
2
3
4
5
6
EXPLAIN SELECT id FROM user ORDER BY id, agent_id;
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+----------+----------+-----------------------------+
| 1 | SIMPLE | user | NULL | index | NULL | idx_agentid | 5 | NULL | 20434945 | 100.00 | Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+----------+----------+-----------------------------+

为了确保查询走索引,这里采用了覆盖索引的方式。

1.2.2 案例2

使用复合索引(多列索引)的多个列进行排序时,不遵循最左前缀匹配原则。

1
SELECT * FROM t1 WHERE key2=constant ORDER BY key_part1, key_part3;
1
2
3
4
5
6
EXPLAIN SELECT id FROM user ORDER BY name, score;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+
| 1 | SIMPLE | user | NULL | index | NULL | idx_score_name | 56 | NULL | 20434945 | 100.00 | Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+

本文没有建立三个列的复合索引,这里采用覆盖索引的方式来实现,达到的效果是一样的。

1.2.3 案例3

使用复合索引(多列索引)的多个列进行排序时,同时存在升序和降序的混合排序。

1
SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;
1
2
3
4
5
6
EXPLAIN SELECT id FROM user ORDER BY score, name DESC;
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+
| 1 | SIMPLE | user | NULL | index | NULL | idx_score_name | 56 | NULL | 20434945 | 100.00 | Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+----------+----------+-----------------------------+

为了确保查询走索引,这里采用了覆盖索引的方式。

2. LIMIT 优化

2.1 案例

查询代理商ID为2下面的所有用户信息,每页展示20条数据,起始值从200万行开始。为保证优化前和优化后返回相同的数据列表,这里增加了根据主键排序的条件:

1
2
3
4
5
6
EXPLAIN SELECT * FROM user WHERE agent_id = 2 ORDER BY id LIMIT 2000000, 20;
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+---------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+---------+----------+-----------------------+
| 1 | SIMPLE | user | NULL | ref | idx_agentid | idx_agentid | 5 | const | 5221432 | 100.00 | Using index condition |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+---------+----------+-----------------------+

在一张数据量为2000万行的表中,真实的查询耗时为3 min 0.54 sec

2.2 优化

思路:采用覆盖索引的方式先取出已经通过索引加速排序好的分页数据的行记录ID,然后再通过行记录ID关联回表查询所需的所有数据。这样可以减少全表扫描的行数,提升查询效率。

1
2
3
4
5
6
7
8
EXPLAIN SELECT * FROM user INNER JOIN (SELECT id FROM user WHERE agent_id = 2 ORDER BY id LIMIT 2000000, 20) tmp USING(id);
+----+-------------+------------+------------+--------+---------------+-------------+---------+--------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------+-------------+---------+--------+---------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2000020 | 100.00 | NULL |
| 1 | PRIMARY | user | NULL | eq_ref | PRIMARY | PRIMARY | 4 | tmp.id | 1 | 100.00 | NULL |
| 2 | DERIVED | user | NULL | ref | idx_agentid | idx_agentid | 5 | const | 5221432 | 100.00 | Using where; Using index |
+----+-------------+------------+------------+--------+---------------+-------------+---------+--------+---------+----------+--------------------------+

在一张数据量为2000万行的表中,真实的查询耗时为0.42 sec。优化后的查询时间是毫秒级别的,速度提升了约180倍。

3. IN 优化

3.1 案例

查询二级代理商下的所有用户信息,因测试数据量过大,此处只查询前20条数据。为保证优化前和优化后返回相同的数据列表,这里增加了根据主键排序的条件:

1
2
3
4
5
6
7
EXPLAIN SELECT * FROM user WHERE agent_id IN (SELECT id FROM agent WHERE level = 2) ORDER BY id LIMIT 20;
+----+-------------+-------+------------+------+-------------------+-------------+---------+---------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-------------------+-------------+---------+---------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | agent | NULL | ref | PRIMARY,idx_level | idx_level | 5 | const | 3 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | user | NULL | ref | idx_agentid | idx_agentid | 5 | test.agent.id | 3433 | 100.00 | NULL |
+----+-------------+-------+------------+------+-------------------+-------------+---------+---------------+------+----------+----------------------------------------------+

代理商表6行数据,用户表2000万行数据,真实的查询耗时为13 min 14.73 sec,极慢,无法忍受。

通过trace分析优化器是如何选择执行计划的:

可以看到,MySQL优化器认为使用JOIN的方式能够更高效率的查找到数据行,因此内部将查询语句改写成了SEMI JOIN(半连接)。也由此可见,在MySQL中,半连接优化的效率也未必高。

3.2 优化

思路:IN查询的性能很糟糕,建议使用EXISTS来等效的改写查询以获得更好的查询性能。

1
2
3
4
5
6
7
EXPLAIN SELECT * FROM user WHERE EXISTS (SELECT 1 FROM agent WHERE level = 2 AND user.agent_id = agent.id) ORDER BY id LIMIT 20;
+----+--------------------+-------+------------+--------+-------------------+---------+---------+--------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+--------+-------------------+---------+---------+--------------------+------+----------+-------------+
| 1 | PRIMARY | user | NULL | index | NULL | PRIMARY | 4 | NULL | 20 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | agent | NULL | eq_ref | PRIMARY,idx_level | PRIMARY | 4 | test.user.agent_id | 1 | 60.00 | Using where |
+----+--------------------+-------+------------+--------+-------------------+---------+---------+--------------------+------+----------+-------------+

代理商表6行数据,用户表2000万行数据,真实的查询耗时为0.02 sec,提升效率是显而易见的。

3.3 案例

上面是涉及关联子查询的情况,如果IN后面是常量而非子查询的情况,效率是很客观的:

1
2
3
4
5
6
EXPLAIN SELECT * FROM user WHERE agent_id IN(2, 3) ORDER BY id LIMIT 20;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | index | idx_agentid | PRIMARY | 4 | NULL | 41 | 48.58 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

在一张数据量为2000万行的表中,真实的查询耗时基本为0.00 sec(没有查询缓存)。