1

一段时间以来,我一直在与 MySQL 死锁问题作斗争。我们有很多表记录数据,然后有插入后触发器,将每分钟的统计信息/汇总数据提取到另一个汇总表中。显然,这会导致多个插入影响汇总表中的同一行。但是由于没有任何东西在等待插入的结果继续,这不应该导致死锁。插入是分批完成的——每隔几毫秒使用一次批量插入。它们可以同时从不同的应用程序中完成。由于这些批量插入语句从来都不是较大事务的一部分,所以我不太明白为什么会导致死锁。如果有人可以解释为什么会发生这种情况,将不胜感激!从错误日志中,我只看到多行:

RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `logschema`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352476 lock_mode X locks rec but not gap
Record lock, heap no 11 PHYSICAL RECORD: n_fields 13; compact format; info bits 0

现在,似乎我终于设法摆脱了死锁,方法是在执行批量插入之前手动使用“锁定表”语句执行 mysql 表锁定。我知道在 innodb 表上执行表级锁是非常不受欢迎的,但是自从我添加了这个表锁之后,我还没有看到死锁发生。

表级锁解决这样的死锁问题有意义吗?这是解决此类问题的一种可接受的方法,还是在使用 innodb 表时不惜一切代价避免表锁?

编辑:汇总表如下所示:

 CREATE TABLE `table_summary_stats` (
  `id` bigint DEFAULT NULL,
  `DateAndTime` datetime NOT NULL,
  `address` varchar(45) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `group` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `result` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `count` int DEFAULT NULL,
  PRIMARY KEY (`DateAndTime`,`group`,`result`,`address`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (to_days(`DateAndTime`))
(PARTITION p_2020_10_26 VALUES LESS THAN (738090) ENGINE = InnoDB,
 PARTITION p_2020_11_10 VALUES LESS THAN (738105) ENGINE = InnoDB,
 PARTITION overflow VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;

触发器会这样做:

    INSERT INTO table_summary_stats
SET 
    DateAndTime = date_format(from_unixtime(NEW.appEpochMilli/1000), '%Y-%m-%d %H:%i:00'),
    address = NEW.address, 
    group = NEW.group,
    result = NEW.result,
    count = 1
on duplicate key
update
    count = count + 1

以下是相关的死锁信息:

------------------------
 LATEST DETECTED DEADLOCK
 ------------------------
 2020-11-02 20:00:53 0x7f0cc032a700
 *** (1) TRANSACTION:
 TRANSACTION 7600352761, ACTIVE 0 sec inserting
 mysql tables in use 2, locked 2
 LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 3
 MySQL thread id 874850, OS thread handle 139654885635840, query id 3299800570 10.15.0.91 cdrwriter update
    INSERT INTO table_summary_stats
    SET 
        DateAndTime = date_format(from_unixtime(NEW.appEpochMilli/1000), '%Y-%m-%d %H:%i:00'),
        address = NEW.address, 
        group = NEW.group,
        result = NEW.result,
        count = 1
    on duplicate key
    update
        count = count + 1
 
 *** (1) HOLDS THE LOCK(S):
 RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `sms_cdr`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352761 lock_mode X locks rec but not gap
 Record lock, heap no 10 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
  0: len 5; hex 99a7c53ec0; asc    > ;;
  1: len 4; hex 74657374; asc test;;
  2: len 30; hex 7b0a202022737461747573223a20226572726f72222c0a202022636f6465; asc {   "status": "error",   "code; (total 76 bytes);
  3: len 11; hex 3933373931303130353131; asc 93791010511;;
  4: len 6; hex 0001c5042df9; asc     - ;;
  5: len 7; hex 01000053520238; asc    SR 8;;
  6: SQL NULL;
  7: len 4; hex 80057c22; asc   |";;
  8: len 8; hex 80000000642f4d05; asc     d/M ;;
  9: len 8; hex 8000000000c03473; asc       4s;;
  10: len 8; hex 800000001a7e7aee; asc      ~z ;;
  11: len 8; hex 8000000000f2b5b1; asc         ;;
  12: len 8; hex 800000008060b217; asc      `  ;;
 
 
 *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
 RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `sms_cdr`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352761 lock_mode X locks rec but not gap waiting
 Record lock, heap no 11 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
  0: len 5; hex 99a7c54000; asc    @ ;;
  1: len 4; hex 74657374; asc test;;
  2: len 30; hex 7b0a202022737461747573223a20226572726f72222c0a202022636f6465; asc {   "status": "error",   "code; (total 76 bytes);
  3: len 11; hex 3933373931303130353131; asc 93791010511;;
  4: len 6; hex 0001c5042cdc; asc     , ;;
  5: len 7; hex 02000004ea07ff; asc        ;;
  6: SQL NULL;
  7: len 4; hex 8003095b; asc    [;;
  8: len 8; hex 8000000036a3a0bb; asc     6   ;;
  9: len 8; hex 8000000000785507; asc      xU ;;
  10: len 8; hex 800000000e23089a; asc      #  ;;
  11: len 8; hex 80000000008c8e08; asc         ;;
  12: len 8; hex 8000000045cb8c64; asc     E  d;;
 
 
 *** (2) TRANSACTION:
 TRANSACTION 7600352476, ACTIVE 0 sec inserting
 mysql tables in use 2, locked 2
 LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 75
 MySQL thread id 874775, OS thread handle 139672774735616, query id 3299800787 10.15.0.90 cdrwriter update
    INSERT INTO table_summary_stats
    SET 
        DateAndTime = date_format(from_unixtime(NEW.appEpochMilli/1000), '%Y-%m-%d %H:%i:00'),
        address = NEW.address, 
        group = NEW.group,
        result = NEW.result,
        count = 1
    on duplicate key
    update
        count = count + 1
 
 *** (2) HOLDS THE LOCK(S):
 RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `sms_cdr`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352476 lock_mode X locks rec but not gap
 Record lock, heap no 11 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
  0: len 5; hex 99a7c54000; asc    @ ;;
  1: len 4; hex 74657374; asc test;;
  2: len 30; hex 7b0a202022737461747573223a20226572726f72222c0a202022636f6465; asc {   "status": "error",   "code; (total 76 bytes);
  3: len 11; hex 3933373931303130353131; asc 93791010511;;
  4: len 6; hex 0001c5042cdc; asc     , ;;
  5: len 7; hex 02000004ea07ff; asc        ;;
  6: SQL NULL;
  7: len 4; hex 8003095b; asc    [;;
  8: len 8; hex 8000000036a3a0bb; asc     6   ;;
  9: len 8; hex 8000000000785507; asc      xU ;;
  10: len 8; hex 800000000e23089a; asc      #  ;;
  11: len 8; hex 80000000008c8e08; asc         ;;
  12: len 8; hex 8000000045cb8c64; asc     E  d;;
 
 
 *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
 RECORD LOCKS space id 118597 page no 67 n bits 80 index PRIMARY of table `sms_cdr`.`table_summary_stats` /* Partition `p_2020_11_02` */ trx id 7600352476 lock_mode X locks rec but not gap waiting
 Record lock, heap no 10 PHYSICAL RECORD: n_fields 13; compact format; info bits 0
  0: len 5; hex 99a7c53ec0; asc    > ;;
  1: len 4; hex 74657374; asc test;;
  2: len 30; hex 7b0a202022737461747573223a20226572726f72222c0a202022636f6465; asc {   "status": "error",   "code; (total 76 bytes);
  3: len 11; hex 3933373931303130353131; asc 93791010511;;
  4: len 6; hex 0001c5042df9; asc     - ;;
  5: len 7; hex 01000053520238; asc    SR 8;;
  6: SQL NULL;
  7: len 4; hex 80057c22; asc   |";;
  8: len 8; hex 80000000642f4d05; asc     d/M ;;
  9: len 8; hex 8000000000c03473; asc       4s;;
  10: len 8; hex 800000001a7e7aee; asc      ~z ;;
  11: len 8; hex 8000000000f2b5b1; asc         ;;
  12: len 8; hex 800000008060b217; asc      `  ;;
 
 *** WE ROLL BACK TRANSACTION (1)
4

1 回答 1

1

“插入是分批完成的”——按 4 列 PK 对每个批次进行排序。这应该消除许多死锁并将其余的变成“锁定等待”。(也就是说,当出现死锁时,它可以简单地等待另一个连接完成。)

此外,如果可行,请将批次限制为 100 行。

以分区键PRIMARY KEY 开头几乎总是没用的。

(我同意你应该尽量避免LOCK TABLES。)

解释

经典的僵局是:

我抓住第 1 行,你抓住第 2 行,然后我到达第 2 行(但拿不到),你到达第 1 行(但拿不到)。我们谁都不愿意放弃我们所拥有的。

因此,一名裁判介入并迫使我们中的一个在他放弃时给予回报,让另一个继续完成。

我(或你)不可能(或不切实际)抓住所有需要的行;所以这些行实际上是一次抓取一个。想想一个UPDATE正在改变数百万行的巨人。当我抓住所有这些行时停止一切是不明智的。

这被称为“乐观”——处理假设它会成功并向前推进。并且 99.999...% 的时间典型事务将在任何其他连接与其发生冲突之前完成。

如果我们以相同的“顺序”(例如PRIMARY KEY顺序)抓取行,我们中的一个可以完成;另一个可以简单地等待。如果等待只有几毫秒,那么延迟是难以察觉的。(限制批量大小在这里有帮助。)

更好的?

摆脱触发器并简单地执行两个批处理语句可能会更好(即更快且死锁的可能性更小)——一个用于原始批处理INSERT,另一个用于批处理Upsert(又名 IODKU)汇总表。

在任何情况下,捕获事务中的错误并重播整个事务。

更多关于高速插入的讨论:http: //mysql.rjweb.org/doc.php/staging_table (虽然不直接适用,但可以找到一些相关的提示。)

于 2020-11-03T02:04:53.603 回答