• MySQL-如何合理正确使用索引


    前言:

    我们经常通过创建索引进行sql语句的优化,特别是联机交易(OLTP)场景下的数据库,通过创建索引可以使语句的执行效率提供10倍、100倍,但有些时候索引的使用却没有达到我们的预期,出现索引失效或者索引扫描效率低下的问题,这是因为我们没有掌握正确的索引使用方法所导致。

    本文主要讲述如何合理正确的使用索引,索引使用原则。

    索引使用原则:

    1 避免列索引过多

    2 索引列的区分度要高

    3 区分度高的列放左边

    4 匹配左边列

    5 匹配列左前缀

    6 范围扫描后的数据是无序的

    7 避免左边列函数运算

    8 避免隐式转化

    测试表TEST(MySQL 5.7.32):

    1. CREATE TABLE `test` (
    2. `id` bigint(20) NOT NULL AUTO_INCREMENT,
    3. `name` varchar(200) DEFAULT NULL,
    4. `net_address` varchar(50) DEFAULT NULL,
    5. `age` bigint(20) DEFAULT NULL,
    6. `name1` varchar(20) DEFAULT NULL,
    7. `insert_date` datetime DEFAULT NULL,
    8. `name3` varchar(20) DEFAULT NULL,
    9. `insert_date1` varchar(200) DEFAULT NULL,
    10. PRIMARY KEY (`id`)
    11. ) ENGINE=InnoDB AUTO_INCREMENT=724285 DEFAULT CHARSET=utf8mb4 |

    表避免列索引过多

    创建测试表TEST

    create table test(id bigint auto_increment primary key,name varchar(200),net_address varchar(50),age bigint,name1 varchar(20),name2 varchar(20),name3 VARCHAR(20));

    只有主键的索引8个并行插入100W数据需要266秒

    1. [root@rac19a ~]# mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=8 --number-of-queries=1000000 --create-schema=db1 --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
    2. mysqlslap: [Warning] Using a password on the command line interface can be insecure.
    3. Benchmark
    4. Average number of seconds to run all queries: 266.121 seconds
    5. Minimum number of seconds to run all queries: 266.121 seconds
    6. Maximum number of seconds to run all queries: 266.121 seconds
    7. Number of clients running queries: 8
    8. Average number of queries per client: 125000

    接下来,在表上创建6个索引,重新进行8个并行插入100w数据需要282秒

    1. drop table test;
    2. create table test(id bigint auto_increment primary key,name varchar(200),net_address varchar(50),age bigint,name1 varchar(20),name2 varchar(20),name3 VARCHAR(20));
    3. alter table test add index(age);
    4. alter table test add index(net_address);
    5. alter table test add index(name);
    6. alter table test add index(name1);
    7. alter table test add index(name2);
    8. alter table test add index(name3);
    9. mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=16 --number-of-queries=1000000 --create-schema=db1 --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
    10. [root@rac19a ~]# mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=8 --number-of-queries=1000000 --create-schema=db1 --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
    11. mysqlslap: [Warning] Using a password on the command line interface can be insecure.
    12. Benchmark
    13. Average number of seconds to run all queries: 282.414 seconds
    14. Minimum number of seconds to run all queries: 282.414 seconds
    15. Maximum number of seconds to run all queries: 282.414 seconds
    16. Number of clients running queries: 8
    17. Average number of queries per client: 125000

    总结:表上索引过多会影响表的insert,delete,update性能,因为这些DML操作会维护索引

    2 索引列的区分度要高

    表列name1的区分度很低,只有2个不同值

    1. root@mysql.sock 22:42: [db1]>select name1,count(*) from test group by name1;
    2. +-------+----------+
    3. | name1 | count(*) |
    4. +-------+----------+
    5. | a | 1000000 |
    6. | b | 1000000 |
    7. +-------+----------+

    在name1上创建索引ind_name1

    alter table test add index ind_name1(name1);

    按name1条件执行查询sql,索引扫描的效率很低,执行时间为1.33秒,EXTRA为null,表示扫描索引后需要回表

    1. root@mysql.sock 22:45: [db1]>explain select sum(age) from test where name1='a';
    2. +----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
    5. | 1 | SIMPLE | test | NULL | ref | ind_name1 | ind_name1 | 83 | const | 1003085 | 100.00 | NULL |
    6. +----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
    7. 1 row in set, 1 warning (0.00 sec)
    8. root@mysql.sock 22:45: [db1]>select sum(age) from test where name1='a';
    9. +----------+
    10. | sum(age) |
    11. +----------+
    12. | 1000000 |
    13. +----------+
    14. 1 row in set (1.33 sec)

    忽略索引,让sql走全表扫描,执行时间0.60秒,执行效率高于走索引

    1. root@mysql.sock 22:50: [db1]>explain select sum(age) from test IGNORE INDEX(ind_name1) where name1='a';
    2. +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
    5. | 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 2006170 | 10.00 | Using where |
    6. +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
    7. 1 row in set, 1 warning (0.00 sec)
    8. root@mysql.sock 22:50: [db1]>select sum(age) from test IGNORE INDEX(ind_name1) where name1='a';
    9. +----------+
    10. | sum(age) |
    11. +----------+
    12. | 1000000 |
    13. +----------+
    14. 1 row in set (0.60 sec)

     总结:创建索引的列区分度要高,这样才能避免出现高消耗的回表

    3 区分度高的列放左边

    表TEST列age的区分度为51个值,列insert_date的区分度为316

    1. root@mysql.sock 21:06: [db1]>select count( distinct age) from test;
    2. +----------------------+
    3. | count( distinct age) |
    4. +----------------------+
    5. | 51 |
    6. +----------------------+
    7. root@mysql.sock 21:00: [db1]>select count(distinct insert_date) from test;
    8. +-----------------------------+
    9. | count(distinct insert_date) |
    10. +-----------------------------+
    11. | 316 |
    12. +-----------------------------+
    13. 1 row in set (0.00 sec)

    将区分度低的age列放在左边第一列,区分度高的insert_date放在第二列

    root@mysql.sock 21:11:  [db1]>alter table test add index ind_test_2(age,insert_date);

    测试查询索引的访问效率,执行时间为0.11秒,filtered的效率很低只有11.11

    1. root@mysql.sock 21:11: [db1]>explain select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
    2. +----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
    5. | 1 | SIMPLE | test | NULL | range | ind_test_5,ind_test_2 | ind_test_2 | 9 | NULL | 308779 | 11.11 | Using where; Using index |
    6. +----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
    7. 1 row in set, 1 warning (0.00 sec)
    8. root@mysql.sock 21:11: [db1]>select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
    9. +----------+
    10. | count(*) |
    11. +----------+
    12. | 4 |
    13. +----------+
    14. 1 row in set (0.11 sec)

    将区分度低的insert_date列放在左边第一列,区分度高的age放在第二列

    1. root@mysql.sock 21:11: [db1]>alter table test drop index ind_test_2;
    2. Query OK, 0 rows affected (0.04 sec)
    3. Records: 0 Duplicates: 0 Warnings: 0
    4. root@mysql.sock 21:11: [db1]>alter table test add index ind_test_2(insert_date,age);
    5. Query OK, 0 rows affected (1.06 sec)
    6. Records: 0 Duplicates: 0 Warnings: 0

    测试查询索引的访问效率,执行时间为0.001秒,filtered的效率有50

    1. root@mysql.sock 21:12: [db1]>explain select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
    2. +----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
    5. | 1 | SIMPLE | test | NULL | range | ind_test_5,ind_test_2 | ind_test_2 | 6 | NULL | 8 | 50.00 | Using where; Using index |
    6. +----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
    7. 1 row in set, 1 warning (0.00 sec)
    8. root@mysql.sock 21:12: [db1]>select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
    9. +----------+
    10. | count(*) |
    11. +----------+
    12. | 4 |
    13. +----------+
    14. 1 row in set (0.00 sec)
    15. root@mysql.sock 21:12: [db1]>

    总结:创建索引区分度高的列要放在左边,这样才能最大划的过滤数据,提示索引的访问效率

    4 匹配左边列

    创建索引ind_test_2,age在左边第一列,insert_date在第二列

    alter table test add key ind_test_2(age,insert_date);

     使用列age进行查询,可以正常使用到索引

    1. root@mysql.sock 15:51: [db1]>explain select sum(length(name3)) from test where age<2;
    2. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
    5. | 1 | SIMPLE | test | NULL | range | ind_test_2 | ind_test_2 | 9 | NULL | 2000 | 100.00 | Using index condition |
    6. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+

    使用insert_date进行查询,没有使用上索引,因为insert_date在索引的第二列

    1. root@mysql.sock 11:23: [db1]>explain select sum(length(name3)) from test where insert_date>'2022-05-16 15:44:30' and insert_date<'2022-05-20 15:44:30';
    2. +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
    5. | 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 199121 | 11.11 | Using where |
    6. +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

    总结:在创建索引或者写sql语句的时候,需要在使用到的列放在索引的左边,MySQL5.7没有跳跃索引访问的功能,在MySQL 8.0实现了跳跃索引访问功能,但跳跃索引的性能肯定不及范围索引扫描

    5 匹配列左前缀

    创建索引ind_test_4

    alter table test add key ind_test_4(net_address);

    使用左边模糊匹配,可以使用上索引,并且过滤很高

    1. root@mysql.sock 11:37: [db1]>explain select sum(length(name3)) from test where net_address like 'aaa.%';
    2. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
    5. | 1 | SIMPLE | test | NULL | range | ind_test_4 | ind_test_4 | 203 | NULL | 4 | 100.00 | Using index condition |
    6. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
    7. 1 row in set, 1 warning (0.00 sec)

    使用右边模糊匹配,没有使用索引,过滤只有11.11,执行效率下降

    1. root@mysql.sock 11:37: [db1]>explain select sum(length(name3)) from test where net_address like '%.cn';
    2. +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
    5. | 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 199125 | 11.11 | Using where |
    6. +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
    7. 1 row in set, 1 warning (0.00 sec)

    总结:索引是按顺序存放的,如果只有列的右边进行模糊匹配,会导致访问效率降低,对于需要进行右边匹配的查询,建议将值倒序存放

    倒序存放方法

    1. ---倒序存放
    2. update test set net_address=reverse(net_address);
    3. select name3,reverse(net_address) from test where net_address like 'nc.%';
    4. root@mysql.sock 11:43: [db1]>select name3,reverse(net_address) from test where net_address like 'nc.%';
    5. +-------+----------------------+
    6. | name3 | reverse(net_address) |
    7. +-------+----------------------+
    8. | xx | aaa.xxxxx.cn |
    9. | xx | aaa.xxxxx.cn |
    10. +-------+----------------------+
    11. 2 rows in set (0.00 sec)
    12. root@mysql.sock 11:43: [db1]>explain select name3,reverse(net_address) from test where net_address like 'nc.%';
    13. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
    14. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    15. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
    16. | 1 | SIMPLE | test | NULL | range | ind_test_4 | ind_test_4 | 203 | NULL | 2 | 100.00 | Using index condition |
    17. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
    18. 1 row in set, 1 warning (0.00 sec)

    6 范围扫描后的数据是无序的

    创建组合索引ind_test_5,有列age,net_address,insert_date组成

    alter table test add key ind_test_5(age,net_address,insert_date);

    执行以下sql,等值查询之后按单个条件进行范围扫描,通过索引可以消除order by的排序没有使用filesort排序,说明索引等值查询之后,索引其他列的数据依然是有序的

    1. root@mysql.sock 11:54: [db1]>explain select insert_date
    2. from test
    3. where age=20 and net_address ='aaa.xxxxx.cn' and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29'
    4. order by insert_date;
    5. +----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
    6. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    7. +----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
    8. | 1 | SIMPLE | test | NULL | range | ind_test_2,ind_test_3,ind_test_5 | ind_test_5 | 218 | NULL | 2 | 100.00 | Using where; Using index |
    9. +----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+

    执行以下sql,等值查询之后进行两个条件的范围扫描,可以发现使用了filesort文件排序,说明范围扫描之后,索引其他列的数据是乱序的

    1. root@mysql.sock 20:17: [db1]>explain select insert_date from test where age=20 and net_address like 'aaa.xxxxx.cn%' and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29' order by insert_date;
    2. +----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
    5. | 1 | SIMPLE | test | NULL | range | ind_test_2,ind_test_3,ind_test_5 | ind_test_5 | 218 | NULL | 2 | 2.50 | Using where; Using index; Using filesort |
    6. +----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
    7. 1 row in set, 1 warning (0.00 sec)
    8. root@mysql.sock 20:21: [db1]>

    总结:索引进行范围扫描之后的数据是乱序的,这时候的索引查询结果没法消除排序,需要进行filesort文件排序

    7 避免左边列函数运算

    创建索引ind_test_6

    alter table test add key ind_test_6(age);

    对age在左边进行运算age-2,可以发现sql没有使用上索引,索引失效

    1. root@mysql.sock 20:28: [db1]>explain select name from test where age-50=2;
    2. +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
    5. | 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 199121 | 100.00 | Using where |
    6. +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

    对age在右边进行运算50+2,可以发现sql正常使用上索引ind_test_6

    1. root@mysql.sock 20:28: [db1]>explain select name from test where age=50+2;
    2. +----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+
    5. | 1 | SIMPLE | test | NULL | ref | ind_test_6 | ind_test_6 | 9 | const | 2000 | 100.00 | Using index condition |
    6. +----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+

    总结:避免对左边列进行函数或者算术运算,会导致索引失效

    8 避免隐式转化

    创建索引ind_test_7

    alter table test add key ind_test_7(insert_date1);

    按条件insert_date1进行范围查询,可以使用索引

    1. root@mysql.sock 20:56: [db1]>explain select count(*) from test where insert_date1 >'20220516160000' and insert_date1<'20220518160000';
    2. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
    5. | 1 | SIMPLE | test | NULL | range | ind_test_1 | ind_test_1 | 803 | NULL | 4000 | 100.00 | Using where; Using index |
    6. +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
    7. root@mysql.sock 20:56: [db1]>select count(*) from test where insert_date1 >'20220516160000' and insert_date1<'20220518160000';
    8. +----------+
    9. | count(*) |
    10. +----------+
    11. | 4000 |
    12. +----------+
    13. 1 row in set (0.00 sec)

    按条件insert_date1进行范围查询,将输入的number类型的值,可以使用索引,但执行时间增加为0.05秒,过滤下降到11.11,并且type类似为index,全索引扫描

    1. root@mysql.sock 20:56: [db1]>explain select count(*) from test where insert_date1 >20220516160000 and insert_date1<20220518160000;
    2. +----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
    3. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    4. +----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
    5. | 1 | SIMPLE | test | NULL | index | ind_test_1 | ind_test_1 | 803 | NULL | 199100 | 11.11 | Using where; Using index |
    6. +----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
    7. root@mysql.sock 20:57: [db1]>select count(*) from test where insert_date1 >20220516160000 and insert_date1<20220518160000;
    8. +----------+
    9. | count(*) |
    10. +----------+
    11. | 4000 |
    12. +----------+
    13. 1 row in set (0.05 sec)

    总结:过滤条件要避免发生隐式转化,隐式转化会引发索引数据访问低效以及索引失效问题

  • 相关阅读:
    内核开发0 --- 代码规范
    Linux ubuntu 服务器部署详细教程
    linux(centos7)配置SSH免密登录
    【VUEX】最好用的传参方式--Vuex的详解
    探索C++赋值运算符重载的内部机制:手把手教你精通
    数字图像处理实验(二)|图像变换(附实验代码和截图)
    Java生成二维码
    golang iris框架 + linux后端运行
    汽车智能座舱/智能驾驶SOC -1
    Web前端vueDemo—实现天气预报功能(四)
  • 原文地址:https://blog.csdn.net/sinat_36757755/article/details/126562400