MySQL-如何合理正确使用索引

前言:

我们经常通过创建索引进行sql语句的优化，特别是联机交易(OLTP)场景下的数据库，通过创建索引可以使语句的执行效率提供10倍、100倍，但有些时候索引的使用却没有达到我们的预期，出现索引失效或者索引扫描效率低下的问题，这是因为我们没有掌握正确的索引使用方法所导致。

本文主要讲述如何合理正确的使用索引，索引使用原则。

索引使用原则:

1 避免列索引过多

2 索引列的区分度要高

3 区分度高的列放左边

4 匹配左边列

5 匹配列左前缀

6 范围扫描后的数据是无序的

7 避免左边列函数运算

8 避免隐式转化

测试表TEST(MySQL 5.7.32):


 CREATE TABLE `test` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(200) DEFAULT NULL,
  `net_address` varchar(50) DEFAULT NULL,
  `age` bigint(20) DEFAULT NULL,
  `name1` varchar(20) DEFAULT NULL,
  `insert_date` datetime DEFAULT NULL,
  `name3` varchar(20) DEFAULT NULL,
  `insert_date1` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=724285 DEFAULT CHARSET=utf8mb4 |

1 表避免列索引过多

创建测试表TEST

create table test(id bigint auto_increment primary key,name varchar(200),net_address varchar(50),age bigint,name1 varchar(20),name2 varchar(20),name3 VARCHAR(20));

只有主键的索引8个并行插入100W数据需要266秒


[root@rac19a ~]# mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=8 --number-of-queries=1000000 --create-schema=db1  --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
  Average number of seconds to run all queries: 266.121 seconds
  Minimum number of seconds to run all queries: 266.121 seconds
  Maximum number of seconds to run all queries: 266.121 seconds
  Number of clients running queries: 8
  Average number of queries per client: 125000

接下来，在表上创建6个索引，重新进行8个并行插入100w数据需要282秒


drop table test;
create table test(id bigint auto_increment primary key,name varchar(200),net_address varchar(50),age bigint,name1 varchar(20),name2 varchar(20),name3 VARCHAR(20)); 
alter table test add index(age);
alter table test add index(net_address);
alter table test add index(name);
alter table test add index(name1);
alter table test add index(name2);
alter table test add index(name3);
mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=16 --number-of-queries=1000000 --create-schema=db1  --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
[root@rac19a ~]# mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=8 --number-of-queries=1000000 --create-schema=db1  --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
  Average number of seconds to run all queries: 282.414 seconds
  Minimum number of seconds to run all queries: 282.414 seconds
  Maximum number of seconds to run all queries: 282.414 seconds
  Number of clients running queries: 8
  Average number of queries per client: 125000

总结:表上索引过多会影响表的insert,delete,update性能,因为这些DML操作会维护索引

2 索引列的区分度要高

表列name1的区分度很低，只有2个不同值


root@mysql.sock 22:42:  [db1]>select name1,count(*) from test group by name1;
+-------+----------+
| name1 | count(*) |
+-------+----------+
| a     |  1000000 |
| b     |  1000000 |
+-------+----------+

在name1上创建索引ind_name1

alter table test add index ind_name1(name1);

按name1条件执行查询sql,索引扫描的效率很低，执行时间为1.33秒，EXTRA为null，表示扫描索引后需要回表


root@mysql.sock 22:45:  [db1]>explain select sum(age) from test where name1='a';
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key       | key_len | ref   | rows    | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
|  1 | SIMPLE      | test  | NULL       | ref  | ind_name1     | ind_name1 | 83      | const | 1003085 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 22:45:  [db1]>select sum(age) from test where name1='a';
+----------+
| sum(age) |
+----------+
|  1000000 |
+----------+
1 row in set (1.33 sec)

忽略索引，让sql走全表扫描,执行时间0.60秒，执行效率高于走索引


root@mysql.sock 22:50:  [db1]>explain select sum(age) from test IGNORE INDEX(ind_name1) where name1='a';
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2006170 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 22:50:  [db1]>select sum(age) from test IGNORE INDEX(ind_name1) where name1='a';
+----------+
| sum(age) |
+----------+
|  1000000 |
+----------+
1 row in set (0.60 sec)

总结:创建索引的列区分度要高，这样才能避免出现高消耗的回表

3 区分度高的列放左边

表TEST列age的区分度为51个值，列insert_date的区分度为316


root@mysql.sock 21:06:  [db1]>select count( distinct age) from test;
+----------------------+
| count( distinct age) |
+----------------------+
|                   51 |
+----------------------+
root@mysql.sock 21:00:  [db1]>select count(distinct insert_date) from test;
+-----------------------------+
| count(distinct insert_date) |
+-----------------------------+
|                         316 |
+-----------------------------+
1 row in set (0.00 sec)

将区分度低的age列放在左边第一列，区分度高的insert_date放在第二列

root@mysql.sock 21:11:  [db1]>alter table test add index ind_test_2(age,insert_date);

测试查询索引的访问效率，执行时间为0.11秒，filtered的效率很低只有11.11


root@mysql.sock 21:11:  [db1]>explain select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys         | key        | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_5,ind_test_2 | ind_test_2 | 9       | NULL | 308779 |    11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 21:11:  [db1]>select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (0.11 sec)

将区分度低的insert_date列放在左边第一列，区分度高的age放在第二列


root@mysql.sock 21:11:  [db1]>alter table test drop index ind_test_2;
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0
root@mysql.sock 21:11:  [db1]>alter table test add index ind_test_2(insert_date,age);
Query OK, 0 rows affected (1.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

测试查询索引的访问效率，执行时间为0.001秒，filtered的效率有50


root@mysql.sock 21:12:  [db1]>explain select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys         | key        | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_5,ind_test_2 | ind_test_2 | 6       | NULL |    8 |    50.00 | Using where; Using index |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 21:12:  [db1]>select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (0.00 sec)
root@mysql.sock 21:12:  [db1]>

总结:创建索引区分度高的列要放在左边，这样才能最大划的过滤数据，提示索引的访问效率

4 匹配左边列

创建索引ind_test_2,age在左边第一列,insert_date在第二列

alter table test add key ind_test_2(age,insert_date);

使用列age进行查询，可以正常使用到索引


root@mysql.sock 15:51:  [db1]>explain select sum(length(name3)) from test where age<2;
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_2    | ind_test_2 | 9       | NULL | 2000 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+

使用insert_date进行查询，没有使用上索引，因为insert_date在索引的第二列


root@mysql.sock 11:23:  [db1]>explain select sum(length(name3)) from test where insert_date>'2022-05-16 15:44:30' and insert_date<'2022-05-20 15:44:30';
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 199121 |    11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

总结:在创建索引或者写sql语句的时候，需要在使用到的列放在索引的左边，MySQL5.7没有跳跃索引访问的功能，在MySQL 8.0实现了跳跃索引访问功能，但跳跃索引的性能肯定不及范围索引扫描

5 匹配列左前缀

创建索引ind_test_4

alter table test add key ind_test_4(net_address);

使用左边模糊匹配，可以使用上索引，并且过滤很高


root@mysql.sock 11:37:  [db1]>explain select sum(length(name3)) from test where net_address like 'aaa.%';
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_4    | ind_test_4 | 203     | NULL |    4 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

使用右边模糊匹配，没有使用索引，过滤只有11.11，执行效率下降


root@mysql.sock 11:37:  [db1]>explain select sum(length(name3)) from test where net_address like '%.cn';
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 199125 |    11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

总结:索引是按顺序存放的，如果只有列的右边进行模糊匹配，会导致访问效率降低，对于需要进行右边匹配的查询，建议将值倒序存放

倒序存放方法


---倒序存放
update test set net_address=reverse(net_address);
select name3,reverse(net_address) from test where net_address like 'nc.%';
root@mysql.sock 11:43:  [db1]>select name3,reverse(net_address) from test where net_address like 'nc.%';
+-------+----------------------+
| name3 | reverse(net_address) |
+-------+----------------------+
| xx    | aaa.xxxxx.cn         |
| xx    | aaa.xxxxx.cn         |
+-------+----------------------+
2 rows in set (0.00 sec)
root@mysql.sock 11:43:  [db1]>explain select name3,reverse(net_address) from test where net_address like 'nc.%';
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_4    | ind_test_4 | 203     | NULL |    2 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

6 范围扫描后的数据是无序的

创建组合索引ind_test_5，有列age,net_address,insert_date组成

alter table test add key ind_test_5(age,net_address,insert_date);

执行以下sql,等值查询之后按单个条件进行范围扫描，通过索引可以消除order by的排序没有使用filesort排序，说明索引等值查询之后，索引其他列的数据依然是有序的


root@mysql.sock 11:54:  [db1]>explain select insert_date 
  from test 
  where age=20  and net_address ='aaa.xxxxx.cn' and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29' 
  order by insert_date;
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys                    | key        | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_2,ind_test_3,ind_test_5 | ind_test_5 | 218     | NULL |    2 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+

执行以下sql，等值查询之后进行两个条件的范围扫描，可以发现使用了filesort文件排序，说明范围扫描之后，索引其他列的数据是乱序的


root@mysql.sock 20:17:  [db1]>explain select insert_date    from test    where age=20  and net_address like 'aaa.xxxxx.cn%' and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29'    order by insert_date;
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys                    | key        | key_len | ref  | rows | filtered | Extra                                    |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_2,ind_test_3,ind_test_5 | ind_test_5 | 218     | NULL |    2 |     2.50 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 20:21:  [db1]>

总结:索引进行范围扫描之后的数据是乱序的，这时候的索引查询结果没法消除排序，需要进行filesort文件排序

7 避免左边列函数运算

创建索引ind_test_6

alter table test add key ind_test_6(age);

对age在左边进行运算age-2,可以发现sql没有使用上索引，索引失效


root@mysql.sock 20:28:  [db1]>explain select name    from test    where age-50=2;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 199121 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

对age在右边进行运算50+2,可以发现sql正常使用上索引ind_test_6


root@mysql.sock 20:28:  [db1]>explain select name    from test    where age=50+2;
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys         | key        | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | test  | NULL       | ref  | ind_test_6 | ind_test_6 | 9       | const | 2000 |   100.00 | Using index condition |
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+

总结:避免对左边列进行函数或者算术运算，会导致索引失效

8 避免隐式转化

创建索引ind_test_7

alter table test add key ind_test_7(insert_date1);

按条件insert_date1进行范围查询，可以使用索引


root@mysql.sock 20:56:  [db1]>explain select count(*) from test where insert_date1 >'20220516160000' and insert_date1<'20220518160000';
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_1    | ind_test_1 | 803     | NULL | 4000 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
root@mysql.sock 20:56:  [db1]>select count(*) from test where insert_date1 >'20220516160000' and insert_date1<'20220518160000';
+----------+
| count(*) |
+----------+
|     4000 |
+----------+
1 row in set (0.00 sec)

按条件insert_date1进行范围查询，将输入的number类型的值，可以使用索引，但执行时间增加为0.05秒，过滤下降到11.11，并且type类似为index，全索引扫描


root@mysql.sock 20:56:  [db1]>explain select count(*) from test where insert_date1 >20220516160000 and insert_date1<20220518160000;
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE        | test  | NULL       | index | ind_test_1    | ind_test_1 | 803     | NULL | 199100 |    11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
root@mysql.sock 20:57:  [db1]>select count(*) from test where insert_date1 >20220516160000 and insert_date1<20220518160000;
+----------+
| count(*) |
+----------+
|     4000 |
+----------+
1 row in set (0.05 sec)

总结:过滤条件要避免发生隐式转化，隐式转化会引发索引数据访问低效以及索引失效问题

相关阅读:
HTML做一个传统节日端午节带设计报告4500字
 记： CTF2023羊城杯 - Reverse 方向 Blast 题目复现and学习记录
 Linux MMC子系统 - 5.eMMC 5.1工作模式-引导模式
 Nacos2.X源码阅读总结
 QT-day1
char[]、char*与string相互转换——独家原创
 【IVI】15.1.2 系统稳定性优化篇（LMKD Ⅱ）PSI 压力失速信息
 idea 通过tomcat 配置 https方式访问
 多线程的知识补充
 用Wokwi仿真ESP-IDF项目
原文地址：https://blog.csdn.net/sinat_36757755/article/details/126562400