mysql - 如何调试 MySQL 上的锁定等待超时？

Question

在我的生产错误日志中，我偶尔会看到：

SQLSTATE[HY000]：一般错误：1205 超过锁定等待超时；尝试重启事务

我知道当时哪个查询正在尝试访问数据库，但是有没有办法找出哪个查询在那个精确时刻锁定了？

score 280 · Accepted Answer

给出这个的是交易这个词。很明显，该查询试图更改一个或多个 InnoDB 表中的至少一行。

由于您知道查询，因此所有正在访问的表都是罪魁祸首的候选者。

从那里，您应该可以运行SHOW ENGINE INNODB STATUS\G

您应该能够看到受影响的表

你会得到各种额外的锁定和互斥信息。

这是我的一位客户的示例：

mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
110514 19:44:14 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 4 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 9014315, signal count 7805377
Mutex spin waits 0, rounds 11487096053, OS waits 7756855
RW-shared spins 722142, OS waits 211221; RW-excl spins 787046, OS waits 39353
------------------------
LATEST FOREIGN KEY ERROR
------------------------
110507 21:41:35 Transaction:
TRANSACTION 0 606162814, ACTIVE 0 sec, process no 29956, OS thread id 1223895360 updating or deleting, thread declared inside InnoDB 499
mysql tables in use 1, locked 1
14 lock struct(s), heap size 3024, 8 row lock(s), undo log entries 1
MySQL thread id 3686635, query id 124164167 10.64.89.145 viget updating
DELETE FROM file WHERE file_id in ('6dbafa39-7f00-0001-51f2-412a450be5cc' )
Foreign key constraint fails for table `backoffice`.`attachment`:
,
  CONSTRAINT `attachment_ibfk_2` FOREIGN KEY (`file_id`) REFERENCES `file` (`file_id`)
Trying to delete or update in parent table, in index `PRIMARY` tuple:
DATA TUPLE: 17 fields;
 0: len 36; hex 36646261666133392d376630302d303030312d353166322d343132613435306265356363; asc 6dbafa39-7f00-0001-51f2-412a450be5cc;; 1: len 6; hex 000024214f7e; asc   $!O~;; 2: len 7; hex 000000400217bc; asc    @   ;; 3: len 2; hex 03e9; asc   ;; 4: len 2; hex 03e8; asc   ;; 5: len 36; hex 65666635323863622d376630302d303030312d336632662d353239626433653361333032; asc eff528cb-7f00-0001-3f2f-529bd3e3a302;; 6: len 40; hex 36646234376337652d376630302d303030312d353166322d3431326132346664656366352e6d7033; asc 6db47c7e-7f00-0001-51f2-412a24fdecf5.mp3;; 7: len 21; hex 416e67656c73204e6f7720436f6e666572656e6365; asc Angels Now Conference;; 8: len 34; hex 416e67656c73204e6f7720436f6e666572656e6365204a756c7920392c2032303131; asc Angels Now Conference July 9, 2011;; 9: len 1; hex 80; asc  ;; 10: len 8; hex 8000124a5262bdf4; asc    JRb  ;; 11: len 8; hex 8000124a57669dc3; asc    JWf  ;; 12: SQL NULL; 13: len 5; hex 8000012200; asc    " ;; 14: len 1; hex 80; asc  ;; 15: len 2; hex 83e8; asc   ;; 16: len 4; hex 8000000a; asc     ;;

But in child table `backoffice`.`attachment`, in index `PRIMARY`, there is a record:
PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 30; hex 36646261666133392d376630302d303030312d353166322d343132613435; asc 6dbafa39-7f00-0001-51f2-412a45;...(truncated); 1: len 30; hex 38666164663561652d376630302d303030312d326436612d636164326361; asc 8fadf5ae-7f00-0001-2d6a-cad2ca;...(truncated); 2: len 6; hex 00002297b3ff; asc   "   ;; 3: len 7; hex 80000040070110; asc    @   ;; 4: len 2; hex 0000; asc   ;; 5: len 30; hex 416e67656c73204e6f7720436f6e666572656e636520446f63756d656e74; asc Angels Now Conference Document;;

------------
TRANSACTIONS
------------
Trx id counter 0 620783814
Purge done for trx's n:o < 0 620783800 undo n:o < 0 0
History list length 35
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 29956, OS thread id 1192212800
MySQL thread id 5341758, query id 189708501 127.0.0.1 lwdba
show innodb status
---TRANSACTION 0 620783788, not started, process no 29956, OS thread id 1196472640
MySQL thread id 5341773, query id 189708353 10.64.89.143 viget
---TRANSACTION 0 0, not started, process no 29956, OS thread id 1223895360
MySQL thread id 5341667, query id 189706152 10.64.89.145 viget
---TRANSACTION 0 0, not started, process no 29956, OS thread id 1227888960
MySQL thread id 5341556, query id 189699857 172.16.135.63 lwdba
---TRANSACTION 0 620781112, not started, process no 29956, OS thread id 1222297920
MySQL thread id 5341511, query id 189696265 10.64.89.143 viget
---TRANSACTION 0 620783736, not started, process no 29956, OS thread id 1229752640
MySQL thread id 5339005, query id 189707998 10.64.89.144 viget
---TRANSACTION 0 620783785, not started, process no 29956, OS thread id 1198602560
MySQL thread id 5337583, query id 189708349 10.64.89.145 viget
---TRANSACTION 0 620783469, not started, process no 29956, OS thread id 1224161600
MySQL thread id 5333500, query id 189708478 10.64.89.144 viget
---TRANSACTION 0 620781240, not started, process no 29956, OS thread id 1198336320
MySQL thread id 5324256, query id 189708493 10.64.89.145 viget
---TRANSACTION 0 617458223, not started, process no 29956, OS thread id 1195141440
MySQL thread id 736, query id 175038790 Has read all relay log; waiting for the slave I/O thread to update it
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
519878 OS file reads, 18962880 OS file writes, 13349046 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 6.25 writes/s, 4.50 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 1190, seg size 1192,
174800 inserts, 174800 merged recs, 54439 merges
Hash table size 35401603, node heap has 35160 buffer(s)
0.50 hash searches/s, 11.75 non-hash searches/s
---
LOG
---
Log sequence number 28 1235093534
Log flushed up to   28 1235093534
Last checkpoint at  28 1235091275
0 pending log writes, 0 pending chkp writes
12262564 log i/o's done, 3.25 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 18909316674; in additional pool allocated 1048576
Dictionary memory allocated 2019632
Buffer pool size   1048576
Free buffers       175763
Database pages     837653
Modified db pages  6
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 770138, created 108485, written 7795318
0.00 reads/s, 0.00 creates/s, 4.25 writes/s
Buffer pool hit rate 1000 / 1000
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 29956, id 1185823040, state: sleeping
Number of rows inserted 6453767, updated 4602534, deleted 3638793, read 388349505551
0.25 inserts/s, 1.25 updates/s, 0.00 deletes/s, 2.75 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

1 row in set, 1 warning (0.00 sec)

您应该考虑通过设置innodb_lock_wait_timeout来增加 InnoDB 的锁定等待超时值，默认值为 50 秒

mysql> show variables like 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50    |
+--------------------------+-------+
1 row in set (0.01 sec)

/etc/my.cnf您可以使用此行将其永久设置为更高的值

[mysqld]
innodb_lock_wait_timeout=120

并重新启动mysql。如果此时无法重新启动 mysql，请运行以下命令：

SET GLOBAL innodb_lock_wait_timeout = 120;

您也可以在会话期间设置它

SET innodb_lock_wait_timeout = 120;

其次是您的查询

score 97 · Accepted Answer

正如有人在有关此问题的许多 SO 线程之一中提到的那样：有时锁定表的进程在进程列表中显示为休眠！我一直在扯头发，直到我杀死了相关数据库中打开的所有睡眠线程（当时没有一个处于活动状态）。这最终解锁了表并让更新查询运行。

评论者说类似于“有时 MySQL 线程锁定一个表，然后在等待与 MySQL 无关的事情发生时休眠。”

在重新查看show engine innodb status日志后（一旦我找到了负责锁定的客户端），我注意到有问题的卡住线程列在事务列表的最底部，在即将出错的活动查询下方由于冻结的锁而退出：

------------------
---TRANSACTION 2744943820, ACTIVE 1154 sec(!!)
2 lock struct(s), heap size 376, 2 row lock(s), undo log entries 1
MySQL thread id 276558, OS thread handle 0x7f93762e7710, query id 59264109 [ip] [database] cleaning up
Trx read view will not see trx with id >= 2744943821, sees < 2744943821

（不确定“Trx 读取视图”消息是否与冻结锁有关，但与其他活动事务不同，此消息不会显示在发出的查询中，而是声称事务正在“清理”，但有多个行锁）

这个故事的寓意是即使线程处于休眠状态，事务也可以处于活动状态。

score 45 · Accepted Answer

您的争用越多，死锁的可能性就越大，数据库引擎将通过使一个死锁事务超时来解决这个问题。

此外，修改（例如UPDATE或DELETE）大量条目的长时间运行的事务更有可能与其他事务产生冲突。

尽管 InnoDB MVCC，您仍然可以使用该FOR UPDATE子句请求显式锁。但是，与其他流行的 DB（Oracle、MSSQL、PostgreSQL、DB2）不同，MySQL 使用REPEATABLE_READ默认隔离级别。

现在，您获得的锁（通过修改行或使用显式锁定）在当前运行的事务期间保持不变。如果您想很好地解释锁定之间的区别REPEATABLE_READ以及READ COMMITTED锁定方面的区别，请阅读这篇 Percona 文章。

在 REPEATABLE READ 中，事务期间获得的每个锁都在事务期间保持。

在 READ COMMITTED 中，与扫描不匹配的锁在 STATEMENT 完成后被释放。

...

这意味着在 READ COMMITTED 中，一旦 UPDATE 语句完成，其他事务可以自由更新它们无法更新的行（在 REPEATABLE READ 中）。

REPEATABLE_READ因此：隔离级别（ , ）越严格SERIALIZABLE，死锁的可能性就越大。这不是“本身”的问题，而是一种权衡。

您可以使用获得非常好的结果READ_COMMITED，因为在使用跨越多个 HTTP 请求的逻辑事务时，您需要应用程序级别的丢失更新预防。乐观锁定方法针对的是即使您使用SERIALIZABLE隔离级别也可能发生的丢失更新，同时通过允许您使用READ_COMMITED.

score 21 · Accepted Answer

记录一下，锁等待超时异常也会发生在死锁并且MySQL无法检测到它的情况下，所以它只是超时。另一个原因可能是运行时间非常长的查询，但是更容易解决/修复，我不会在这里描述这种情况。

如果死锁在两个事务中“正确”构建，MySQL 通常能够处理死锁。MySQL 然后只是杀死/回滚拥有较少锁的一个事务（不太重要，因为它会影响较少的行）并让另一个完成。

现在，假设有两个进程 A 和 B 以及 3 个事务：

Process A Transaction 1: Locks X
Process B Transaction 2: Locks Y
Process A Transaction 3: Needs Y => Waits for Y
Process B Transaction 2: Needs X => Waits for X
Process A Transaction 1: Waits for Transaction 3 to finish

(see the last two paragraph below to specify the terms in more detail)

=> deadlock

这是一个非常不幸的设置，因为 MySQL 看不到死锁（跨越 3 个事务）。所以 MySQL 所做的是……什么都没有！它只是等待，因为它不知道该做什么。它一直等到第一个获得的锁超过超时时间（处理 A 事务 1：锁 X），然后这将解锁锁 X，从而解锁事务 2 等等。

艺术是找出是什么（哪个查询）导致了第一个锁（Lock X）。您将能够很容易地（show engine innodb status）看到事务 3 正在等待事务 2，但您不会看到事务 2 正在等待哪个事务（事务 1）。MySQL 不会打印与事务 1 相关的任何锁或查询。唯一的提示是在事务列表（打印输出）的最底部show engine innodb status，您将看到事务 1 显然什么都不做（但实际上正在等待事务 3结束）。

此处描述了如何查找哪个 SQL 查询导致为正在等待的给定事务授予锁 (Lock X) 的技术Tracking MySQL query history in long running transactions

如果您想知道示例中的流程和事务到底是什么。该进程是一个PHP进程。事务是由innodb-trx-table定义的事务。就我而言，我有两个 PHP 进程，在每个进程中我手动启动了一个事务。有趣的是，即使我在一个进程中启动了一个事务，MySQL 实际上在内部使用了两个单独的事务（我不知道为什么，也许一些 MySQL 开发人员可以解释）。

MySQL 在内部管理自己的事务，并决定（在我的例子中）使用两个事务来处理来自 PHP 进程（进程 A）的所有 SQL 请求。事务 1 正在等待事务 3 完成的语句是 MySQL 内部的事情。MySQL“知道”事务 1 和事务 3 实际上是作为一个“事务”请求（来自进程 A）的一部分实例化的。现在整个“事务”都被阻止了，因为事务 3（“事务”的子部分）被阻止了。因为“事务”无法完成事务 1（也是“事务”的子部分）也被标记为未完成。这就是我所说的“事务 1 等待事务 3 完成”的意思。

score 16 · Accepted Answer

此异常的最大问题是它通常在测试环境中不可重现，并且当它发生在 prod 上时，我们无法运行 innodb 引擎状态。因此，在其中一个项目中，我将以下代码放入了该异常的 catch 块中。这有助于我在异常发生时捕获引擎状态。这有很大帮助。

Statement st = con.createStatement();
ResultSet rs =  st.executeQuery("SHOW ENGINE INNODB STATUS");
while(rs.next()){
    log.info(rs.getString(1));
    log.info(rs.getString(2));
    log.info(rs.getString(3));
}

score 13 · Accepted Answer

这是我最终必须要做的，以找出导致锁定超时问题的“其他查询”。在应用程序代码中，我们在专用于该任务的单独线程上跟踪所有挂起的数据库调用。如果任何数据库调用花费的时间超过 N 秒（对我们来说是 30 秒），我们会记录：

-- Pending InnoDB transactions
SELECT * FROM information_schema.innodb_trx ORDER BY trx_started; 

-- Optionally, log what transaction holds what locks
SELECT * FROM information_schema.innodb_locks;

有了上面，我们能够查明锁定导致死锁的行的并发查询。在我的例子中，它们是INSERT ... SELECT与普通 SELECT 不同的语句锁定底层行。然后，您可以重新组织代码或使用不同的事务隔离，例如未提交的读取。

祝你好运！

score 12 · Accepted Answer

从上面 Rolando 的回答推断，正是这些阻碍了您的查询：

---TRANSACTION 0 620783788, not started, process no 29956, OS thread id 1196472640
MySQL thread id 5341773, query id 189708353 10.64.89.143 viget

如果您需要执行查询并且不能等待其他查询运行，请使用 MySQL 线程 id 将它们关闭：

kill 5341773 <replace with your thread id>

^{（显然来自mysql，而不是shell）}

您必须从以下位置找到线程 ID：

show engine innodb status\G

命令，并找出哪个是阻塞数据库的。

score 12 · Accepted Answer

查看该pt-deadlock-logger实用程序的手册页：

brew install percona-toolkit
pt-deadlock-logger --ask-pass server_name

它从上述信息中提取信息，engine innodb status也可用于创建daemon每 30 秒运行一次的信息。

score 9 · Accepted Answer

您可以使用：

show full processlist

它将列出 MySQL 中的所有连接以及当前的连接状态以及正在执行的查询。还有一个较短的变体show processlist;显示截断的查询以及连接统计信息。

score 3 · Accepted Answer

如果您使用的是 JDBC，那么您可以选择
includeInnodbStatusInDeadlockExceptions=true

https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html

score -1 · Accepted Answer

激活 MySQL general.log（磁盘密集型）并使用mysql_analysis_general_log.pl提取长时间运行的事务，例如：

--min-duration=你的 innodb_lock_wait_timeout 值

之后禁用general.log。

mysql - 如何调试 MySQL 上的锁定等待超时？

11 回答 11

Related

Reference