deadlock - 即使将 wsrep_retry_autocommit 设置得很高，为什么我仍然会出现死锁？

Question

我有一个由 3 个 percona xtradb 5.5.34-55 服务器组成的集群，因为它们都是可写的，所以在任何大量负载下都会出现死锁错误。增加wsrep_retry_autocommit变量在一定程度上有所帮助，但ER_LOCK_DEADLOCK并没有完全消失。所以我尝试设置wsrep_retry_autocommit为 10000（似乎是最大值），认为它会使一些查询非常慢，但它们都不会失败ER_LOCK_DEADLOCK：

mysql-shm -ss -e 'show global variables like "%wsrep_retry_auto%"'
wsrep_retry_autocommit  10000

------------------------
LATEST DETECTED DEADLOCK
------------------------
140414 10:29:23
*** (1) TRANSACTION:
TRANSACTION 72D8, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 34, OS thread handle 0x7f11840d4700, query id 982 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of
table `shm`.`metric` trx id 72D8 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 72D7, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
7 lock struct(s), heap size 3112, 141 row lock(s), undo log entries 40
MySQL thread id 50, OS thread handle 0x7f1184115700, query id 980 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0),('localhost','cpu-3/cpu-nice',8,0,0,0),
('localhost','cpu-3/cpu-system',8,0,0,0),('localhost','cpu-3/cpu-idle',8,0,0,0),
('localhost','cpu-3/cpu-wait',8,0,0,0),('localhost','cpu-3/cpu-interrupt',8,0,0,0),
('localhost','cpu-3/cpu-softirq',8,0,0,0),('localhost'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of table 
`shm`.`metric` trx id 72D7 lock_mode X
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 504 index `unique-metric` of table 
`shm`.`metric` trx id 72D7 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (1)

不应该重试吗？有没有办法验证 percona 实际重试了 10000 次查询？

score 2 · Accepted Answer

对您的问题没有确切的答案，但是对于任何写入密集型负载（如果您尝试插入与该死的 Drupal 相同的数据），就会发生死锁，并且对我来说唯一的解决方案（仍在等待确认这是100% OK解决方案）- 是在galera节点前面使用haproxy，并定义第一个节点（haproxy后端定义）使用，其他2个节点用作备份。

这样，所有 mysql 流量都将从客户端通过 haproxy 流向单个 galera 节点，如果该节点发生故障，将使用其他一些节点。

希望有帮助......安德里亚

score 0 · Accepted Answer

在您的回答中，可扩展性是一个问题，因为我们在一个集群中，但只使用一个节点确实是对资源的不好利用。所以替代方案是，您可以使用任何负载均衡器，如果它是 haproxy，您可以在 3306 和 3305 等两个端口上创建 2 个侦听器；然后说
lister绑定到3306从应用程序获取所有写请求，这个后端将有节点1，然后节点2和节点3作为备份；lister 绑定到 3305 将拥有来自 application 的所有读取请求，其后端将正常指定所有节点。因此，它的读取可扩展性和写入可扩展性有限，死锁可以减少到非常扩展。

deadlock - 即使将 wsrep_retry_autocommit 设置得很高，为什么我仍然会出现死锁？

2 回答 2

Related

Reference