首先我们在日常的开发中,大概率会使用批量更新,或者在一个事务里面做增删改查,那么就有可能不同事务之间导致死锁的发生。这里主要讲的是如何将当时发生死锁的信息输出到日志文件中,以及具体的SQL打印。
查了很多网上的文章,都是使用什么下面之类的命令
- show engine innodb status \G;
- SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX;
但是我发现这些命令都是查正在死锁的语句,并不是历史死锁的语句,想想我们在日常工作中包括生产环境,如果发生了死锁,应用监控告警了,那么我们应该去找的历史死锁日志。
言归正传,那么如何找到历史的死锁日志。
确认错误日志打印路径,死锁的日志都会打印到这个错误日志文件,是可以看到历史日志的关键
通过命令 show global variables like "log_error%";查看错误日志路径
MySQL有死锁检测机制,需要打开死锁检测机制(网上说死锁检测机制比较耗费性能,这个看个人取舍,我个人认为在现在的硬件环境支持下,这一点性能损耗是可以接受的)
- show variables like 'innodb_deadlock_detect';
- set GLOBAL innodb_deadlock_detect = ON;
-
- 当然你也可以在你的My.ini配置文件中写死(推荐)
- innodb_deadlock_detect = ON;
什么是MySQL的Monitor不在这里细说,想要了解的朋友可以参考InnoDB Monitor(innodb监视器)_yimenglin的博客-CSDN博客_innodb_monitor
- show global variables like "innodb_status_output%";
-
- set GLOBAL innodb_status_output = ON
- set GLOBAL innodb_status_output_locks = ON
-
- 同样最好在My.ini进行配置写死,免得每次重启服务器都要重新设置
- innodb_status_output = ON
- innodb_status_output_locks = ON
(非必要,博主发现不开这个参数也能实现打印历史的锁日志,如果你为了锁的日志更全面,也可以开启)
- show global variables like "innodb_print_all_deadlocks%";
- set GLOBAL innodb_print_all_deadlocks = ON;
-
- 同样建议在my.ini配置文件中写死
-
- innodb_print_all_deadlocks = ON
将上面的配置开启后,如果MySQL遇见死锁,则会帮你检测出来,并且回滚权重比较小的事务。这样另外一个事务就能往下执行(相当于自动帮你解除死锁)
首先找到模拟一波
开启两个事务
交叉update id = 1 和 id = 2的记录,其中事务1出现了死锁。
Deadlock found when trying to get lock; try restarting transaction
同时也证明了死锁检测的好处,会自动帮你回滚其中一个事务的操作,让另外一个竞争的事务能够往下执行。
接下来才是我们的重头戏,我们如何查找错误日志。
根据第一步错误日志的文件
- 使用关键词INNODB MONITOR OUTPUT对日志进行过滤,一般会有开头和结束,例如我这次模拟的错的日志文件如下:
-
- 2022-11-10 07:46:26 0x6ef4 INNODB MONITOR OUTPUT
- =====================================
- Per second averages calculated from the last 20 seconds
- -----------------
- BACKGROUND THREAD
- -----------------
- srv_master_thread loops: 2 srv_active, 0 srv_shutdown, 131 srv_idle
- srv_master_thread log flush and writes: 0
- ----------
- SEMAPHORES
- ----------
- OS WAIT ARRAY INFO: reservation count 3
- OS WAIT ARRAY INFO: signal count 3
- RW-shared spins 0, rounds 0, OS waits 0
- RW-excl spins 0, rounds 0, OS waits 0
- RW-sx spins 0, rounds 0, OS waits 0
- Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
- ------------------------
- LATEST DETECTED DEADLOCK
- ------------------------
- 2022-11-10 07:46:22 0x7a38
- *** (1) TRANSACTION:
- TRANSACTION 273418, ACTIVE 25 sec starting index read
- mysql tables in use 1, locked 1
- LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
- MySQL thread id 9, OS thread handle 4580, query id 72 localhost 127.0.0.1 root updating
- update foo set name = '123' where id = 2
-
- *** (1) HOLDS THE LOCK(S):
- RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273418 lock_mode X locks rec but not gap
- Record lock, heap no 28 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
- 0: len 4; hex 00000001; asc ;;
- 1: len 6; hex 000000042c0a; asc , ;;
- 2: len 7; hex 02000000b726d4; asc & ;;
- 3: len 3; hex 313233; asc 123;;
-
-
- *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
- RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273418 lock_mode X locks rec but not gap waiting
- Record lock, heap no 27 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
- 0: len 4; hex 00000002; asc ;;
- 1: len 6; hex 00000004260a; asc & ;;
- 2: len 7; hex 02000000b725b8; asc % ;;
- 3: len 4; hex 33343536; asc 3456;;
-
-
- *** (2) TRANSACTION:
- TRANSACTION 273419, ACTIVE 19 sec starting index read
- mysql tables in use 1, locked 1
- LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
- MySQL thread id 10, OS thread handle 18376, query id 73 localhost 127.0.0.1 root updating
- update foo set name = '3456' where id = 1
-
- *** (2) HOLDS THE LOCK(S):
- RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273419 lock_mode X locks rec but not gap
- Record lock, heap no 27 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
- 0: len 4; hex 00000002; asc ;;
- 1: len 6; hex 00000004260a; asc & ;;
- 2: len 7; hex 02000000b725b8; asc % ;;
- 3: len 4; hex 33343536; asc 3456;;
-
-
- *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
- RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273419 lock_mode X locks rec but not gap waiting
- Record lock, heap no 28 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
- 0: len 4; hex 00000001; asc ;;
- 1: len 6; hex 000000042c0a; asc , ;;
- 2: len 7; hex 02000000b726d4; asc & ;;
- 3: len 3; hex 313233; asc 123;;
-
- *** WE ROLL BACK TRANSACTION (2)
- ------------
- TRANSACTIONS
- ------------
- Trx id counter 273420
- Purge done for trx's n:o < 273414 undo n:o < 0 state: running but idle
- History list length 0
- LIST OF TRANSACTIONS FOR EACH SESSION:
- ---TRANSACTION 284451204563104, not started
- 0 lock struct(s), heap size 1136, 0 row lock(s)
- ---TRANSACTION 284451204561440, not started
- 0 lock struct(s), heap size 1136, 0 row lock(s)
- ---TRANSACTION 284451204560608, not started
- 0 lock struct(s), heap size 1136, 0 row lock(s)
- ---TRANSACTION 273418, ACTIVE 29 sec
- 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
- MySQL thread id 9, OS thread handle 4580, query id 72 localhost 127.0.0.1 root
- TABLE LOCK table `wechat_min_pro`.`foo` trx id 273418 lock mode IX
- RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273418 lock_mode X locks rec but not gap
- Record lock, heap no 28 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
- 0: len 4; hex 00000001; asc ;;
- 1: len 6; hex 000000042c0a; asc , ;;
- 2: len 7; hex 02000000b726d4; asc & ;;
- 3: len 3; hex 313233; asc 123;;
- RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273418 lock_mode X locks rec but not gap
- Record lock, heap no 27 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
- 0: len 4; hex 00000002; asc ;;
- 1: len 6; hex 000000042c0a; asc , ;;
- 2: len 7; hex 02000000b726f6; asc & ;;
- 3: len 3; hex 313233; asc 123;;
- --------
- FILE I/O
- --------
- I/O thread 0 state: wait Windows aio (insert buffer thread)
- I/O thread 1 state: wait Windows aio (log thread)
- I/O thread 2 state: wait Windows aio (read thread)
- I/O thread 3 state: wait Windows aio (read thread)
- I/O thread 4 state: wait Windows aio (read thread)
- I/O thread 5 state: wait Windows aio (read thread)
- I/O thread 6 state: wait Windows aio (write thread)
- I/O thread 7 state: wait Windows aio (write thread)
- I/O thread 8 state: wait Windows aio (write thread)
- I/O thread 9 state: wait Windows aio (write thread)
- Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
- ibuf aio reads:, log i/o's:, sync i/o's:
- Pending flushes (fsync) log: 0; buffer pool: 0
- 1500 OS file reads, 232 OS file writes, 55 OS fsyncs
- 13.15 reads/s, 16384 avg bytes/read, 0.75 writes/s, 0.40 fsyncs/s
- -------------------------------------
- INSERT BUFFER AND ADAPTIVE HASH INDEX
- -------------------------------------
- Ibuf: size 1, free list len 195, seg size 197, 0 merges
- merged operations:
- insert 0, delete mark 0, delete 0
- discarded operations:
- insert 0, delete mark 0, delete 0
- Hash table size 2267, node heap has 1 buffer(s)
- Hash table size 2267, node heap has 0 buffer(s)
- Hash table size 2267, node heap has 0 buffer(s)
- Hash table size 2267, node heap has 0 buffer(s)
- Hash table size 2267, node heap has 0 buffer(s)
- Hash table size 2267, node heap has 0 buffer(s)
- Hash table size 2267, node heap has 0 buffer(s)
- Hash table size 2267, node heap has 1 buffer(s)
- 0.00 hash searches/s, 0.15 non-hash searches/s
- ---
- LOG
- ---
- Log sequence number 6117561793
- Log buffer assigned up to 6117561793
- Log buffer completed up to 6117561793
- Log written up to 6117561793
- Log flushed up to 6117561793
- Added dirty pages up to 6117561793
- Pages flushed up to 6117561793
- Last checkpoint at 6117561793
- 21 log i/o's done, 0.10 log i/o's/second
- ----------------------
- BUFFER POOL AND MEMORY
- ----------------------
- Total large memory allocated 8585216
- Dictionary memory allocated 388960
- Buffer pool size 512
- Free buffers 252
- Database pages 258
- Old database pages 0
- Modified db pages 0
- Pending reads 0
- Pending writes: LRU 0, flush list 0, single page 0
- Pages made young 0, not young 0
- 0.00 youngs/s, 0.00 non-youngs/s
- Pages read 1475, created 142, written 166
- 0.00 reads/s, 0.00 creates/s, 0.00 writes/s
- Buffer pool hit rate 353 / 1000, young-making rate 0 / 1000 not 0 / 1000
- Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
- LRU len: 258, unzip_LRU len: 0
- I/O sum[291]:cur[2], unzip sum[0]:cur[0]
- --------------
- ROW OPERATIONS
- --------------
- 0 queries inside InnoDB, 0 queries in queue
- 0 read views open inside InnoDB
- Process ID=9120, Main thread ID=00000000000053B0 , state=sleeping
- Number of rows inserted 0, updated 2, deleted 0, read 3
- 0.00 inserts/s, 0.05 updates/s, 0.00 deletes/s, 0.05 reads/s
- Number of system rows inserted 0, updated 315, deleted 0, read 5798
- 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
- ----------------------------
- END OF INNODB MONITOR OUTPUT
- ============================
我们发现产生死锁的两条SQL语句都被输出出来了
update foo set name = '123' where id = 2
update foo set name = '3456' where id = 1
定位出具体的SQL,那么我们就可以在我们的代码上进行优化了,具体优化思路可以参照我另外一篇博客
记录一次生产环境MySQL死锁以及解决思路_小胖子——鑫的博客-CSDN博客
在生产环境日志,我这边是建议开启死锁检测、以及锁日志打印的,因为一旦发生死锁,首先死锁检测能帮助我们解除死锁,其次我们能够快速的定位具体的SQL以及代码问题,比起牺牲的性能来说,我们排查问题的效率就大大提升了。