我有一个由 3 个 percona xtradb 5.5.34-55 服务器组成的集群,因为它们都是可写的,所以在任何大量负载下都会出现死锁错误。增加wsrep_retry_autocommit
变量在一定程度上有所帮助,但ER_LOCK_DEADLOCK
并没有完全消失。所以我尝试设置wsrep_retry_autocommit
为 10000(似乎是最大值),认为它会使一些查询非常慢,但它们都不会失败ER_LOCK_DEADLOCK
:
mysql-shm -ss -e 'show global variables like "%wsrep_retry_auto%"'
wsrep_retry_autocommit 10000
------------------------
LATEST DETECTED DEADLOCK
------------------------
140414 10:29:23
*** (1) TRANSACTION:
TRANSACTION 72D8, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 34, OS thread handle 0x7f11840d4700, query id 982 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of
table `shm`.`metric` trx id 72D8 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 72D7, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
7 lock struct(s), heap size 3112, 141 row lock(s), undo log entries 40
MySQL thread id 50, OS thread handle 0x7f1184115700, query id 980 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0),('localhost','cpu-3/cpu-nice',8,0,0,0),
('localhost','cpu-3/cpu-system',8,0,0,0),('localhost','cpu-3/cpu-idle',8,0,0,0),
('localhost','cpu-3/cpu-wait',8,0,0,0),('localhost','cpu-3/cpu-interrupt',8,0,0,0),
('localhost','cpu-3/cpu-softirq',8,0,0,0),('localhost'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of table
`shm`.`metric` trx id 72D7 lock_mode X
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 504 index `unique-metric` of table
`shm`.`metric` trx id 72D7 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (1)
不应该重试吗?有没有办法验证 percona 实际重试了 10000 次查询?