故障分析 | MySQL锁等待超时一例分析

作者：付祥

现居珠海，主要负责 Oracle、MySQL、mongoDB 和 Redis 维护工作。

本文来源：原创投稿

*爱可生开源社区出品，原创内容未经授权不得随意使用，转载请联系小编并注明来源。

1、问题现象

开发反馈某业务持续性报锁等待超时，相关错误信息如下：

Lock wait timeout exceeded; try restarting transaction
1

为了能精确定位问题，继续询问开发有没有锁等待超时相关SQL，开发又给了相关报错SQL：

INSERT INTO  VALUES(...)
1

2、分析诊断

根据错误信息得知，单条insert语句锁等待超时，如果都是单条insert插入，不应该频繁报锁超时，似乎有点不寻常，当前数据库版本为5.6，锁等待超时参数设置时长30秒：

root@ (none)> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 30 |
+--------------------------+-------+
1
2
3
4
5
6

查看慢日志及 show engine innodb status\G ，发现有批量插入动作，由于自增锁竞争产生死锁：

询问开发，批量插入SQL为定时作业，查看当前innodb_autoinc_lock_mode参数设置：

+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_autoinc_lock_mode | 1 |
+--------------------------+-------+
1
2
3
4
5

innodb_autoinc_lock_mode=1，对于批量插入语句，需要等到语句执行结束才释放自增锁，故要解决锁等待超时，可以将参数值设置为2，但该参数为静态参数需要重启MySQL才能生效，不能重启情况下只能优化SQL执行时间，查看慢日志得知SQL执行一次需要100+秒，扫描行数86w，结果集却为0，说明SQL有优化空间：

Query_time: 108.527499 Lock_time: 0.000342 Rows_sent: 0 Rows_examined: 862584
1

分析SQL执行计划：

SELECT *
from ( SELECT * from aa WHERE add_time >= '2022-10-01' ) a
left JOIN ( SELECT * from bb WHERE add_time >= '2022-10-01' ) b
on a.account = b.accountb and a.end_time = b.end_timeb and a.app_id = b.app_idb WHERE
b.accountb is null;
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
| 1 | PRIMARY |  | ALL | NULL | NULL | NULL |
NULL | 2722 | NULL |
| 1 | PRIMARY |  | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where; Using join buffer (Block Nested Loop) |
| 3 | DERIVED | bb | ALL | NULL | NULL | NULL |
NULL | 595248 | Using where |
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time | 6 |
NULL | 2722 | Using index condition |
+----+-------------+----------------+-------+---------------+--------------+---------
+------+--------+----------------------------------------------------+
4 rows in set (0.00 sec)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

SQL有子查询，使用到了派生表，首先执行子查询

( SELECT * from aa WHERE add_time >= '2022-10-01' )
1

将结果集存入临时表derived2 ，然后执行子查询

( SELECT **** from bb WHERE add_time >= '2022-10-01' )
1

将结果集存入临时表derived3，最后derived2和derived3根据关联条件做表关联，使用Block Nested Loop算法，即使表chat_black（account , app_id , end_time）列有复合索引也使用不到。

如果mysql版本是5.7的话，optimizer_switch 参数会增加一个选项：derived_merge=on ，满足一定条件，即子查询中没有如下条件：

Aggregate functions ( SUM() , MIN() , MAX() , COUNT() , and so forth)
DISTINCT
GROUP BY
HAVING
LIMIT
UNION or UNION ALL
Subqueries in the select list
Assignments to user variables
Refererences only to literal values (in this case, there is no underlying table)

子查询将被合并到外层查询。

3、问题解决

知道SQL慢的原因后，对SQL进行改写，执行计划如下：

SELECT * FROM (
select * from aa where add_time >= '2022-10-01') a
left join bb b
on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time =
b.end_time and a.app_id = b.app_id)
where b.account is null;
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+-----------------------+
| 1 | PRIMARY |  | ALL | NULL | NULL
| NULL | NULL | 3096 | NULL |
| 1 | PRIMARY | b | ref | idx_ac_app_id_end_time |
idx_ac_app_id_end_time | 1542 | a.account,a.app_id,a.end_time | 1 | Using where
|
| 2 | DERIVED | aa | range | idx_add_time | idx_add_time
| 6 | NULL | 3096 | Using index condition |
+----+-------------+----------------+-------+------------------------+----------------
--------+---------+-------------------------------+------+---------------------
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

执行时间从原来的100+秒降低不到1秒：

root@ xsj_chat_filter> SELECT count(*) FROM (
-> select * from aa where add_time >= '2022-10-01') a
-> left join bb b
-> on ( b.add_time >= '2022-10-01' and a.account = b.account and a.end_time
= b.end_time and a.app_id = b.app_id)
-> where b.account is null;
+----------+
| count(*) |
+----------+
| 23       |
+----------+
1 row in set (0.65 sec)
1
2
3
4
5
6
7
8
9
10
11
12

执行时间短了，自然就不存在自增锁等待超时了。

相关阅读:
JavaSE 基础（十三）网络编程
 【每日一练】勾股定理困难版
 【开源】基于SpringBoot的衣物搭配系统的设计和实现
 线程可重复使用，程序开发是如何使用线程池的呢？
【数据结构-树】哈夫曼树
 Flutter学习4 - Dart数据类型
 年薪高达50W的测开，到底是做什么的？
记一次 .NET某游戏后端API服务 CPU爆高分析
 蓝桥杯2024年第十五届省赛真题-小球反弹
 Java常用类
原文地址：https://blog.csdn.net/ActionTech/article/details/127885171