0

我正在尝试使用正则表达式解析 java 中的 INNODB 状态。我正在尝试提取有关死锁的信息。我正在尝试获取与死锁相关的信息。我正在使用以下正则表达式来获取与 LATEST DETECTED DEADLOCK 相关的文本块。

String innodbStatus =  <required_INNODB_status>; // assume this is the text
String multiLineRegEx = "((.*)(?:(?:\r\n|[\r\n]))*)*";
String newLineCharacter = System.getProperty("line.separator");
String myRegEx = "[-]+" + newLineCharacter + deadLockLabel + newLineCharacter + "[-]+" +  newLineCharacter + multiLineRegEx + "[-]+";

assertTrue(innodbStatus.matches(myRegEx));

我不确定如何从整个innodb 状态中提取多行文本,即与死锁相关的文本。上面的测试通过了,但是下面的代码没有返回我正在寻找的所需字符串......

Pattern pattern = Pattern.compile(multiLineRegEx);
Matcher matcher = pattern.matcher(targetFileStr);
if (matcher.find())
{
  String requiredString = matcher.group(1));
}

关于如何提取所需文本的任何建议。我正在尝试解析的示例 INNODB 状态..

=====================================
130502 14:18:59 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 27 seconds
------------------------
LATEST DETECTED DEADLOCK
------------------------
060717  4:16:48
*** (1) TRANSACTION:
TRANSACTION 0 42313619, ACTIVE 49 sec, process no 10099, OS thread id 3771312 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 320
MySQL thread id 30898, query id 100626 localhost root Updating
update iz set pad='a' where i=2
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 16403 n bits 72 index `PRIMARY` of table `test/iz` trx id 0 42313619 lock_mode X locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 80000002; asc     ;; 1: len 6; hex 00000285a78f; asc       ;; 2: len 7; hex 00000040150110; asc    @   ;; 3: len 10; hex 61202020202020202020; asc a         ;;

*** (2) TRANSACTION:
TRANSACTION 0 42313620, ACTIVE 24 sec, process no 10099, OS thread id 4078512 starting index read, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
3 lock struct(s), heap size 320
MySQL thread id 30899, query id 100627 localhost root Updating
update iz set pad='a' where i=1
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 16403 n bits 72 index `PRIMARY` of table `test/iz` trx id 0 42313620 lock_mode X locks rec but not gap
Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 80000002; asc     ;; 1: len 6; hex 00000285a78f; asc       ;; 2: len 7; hex 00000040150110; asc    @   ;; 3: len 10; hex 61202020202020202020; asc a         ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 16403 n bits 72 index `PRIMARY` of table `test/iz` trx id 0 42313620 lock_mode X locks rec but not gap waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;; 1: len 6; hex 00000285a78e; asc       ;; 2: len 7; hex 000000003411d9; asc     4  ;; 3: len 10; hex 61202020202020202020; asc a         ;;

*** WE ROLL BACK TRANSACTION (2)

----------
SEMAPHORES
----------
4

1 回答 1

0

在找到可接受的标题/前缀以开始匹配之后,您的正则表达式似乎试图匹配和捕获由破折号组成的行之间的文本(deadLockLabel - 假设被初始化为类似的东西,LATEST DETECTED DEADLOCK因为它从您的代码中丢失)。

我推荐以下内容:

  • 改为[-]+just -+- [] 是多余的。
  • 添加+ newLineCharacter到正则表达式的末尾,以便您要捕获的文本位于仅包含 - 的完整行之间,而不仅仅是恰好出现在行首某处的 - 序列。
  • MULTILINE理想情况下,通过传递标志,适当地用 ^ 和 $ 替换 newLineCharacter 的使用。
  • 用类似的东西替换用于捕获文本的正则表达式:(((?:.*(?:\r\n|\r|\n))+?)注意 + 运算符的不贪心),或者更好的是,只需(.+?)传递DOTALL标志。

生成的优化正则表达式如下所示:

^-{4,}+(?:\r\n|\r|\n)LATEST DETECTED DEADLOCK(?:\r\n|\r|\n)-{4,}+(?:\r\n|\r|\n)(.+)-{4,}+$

我坚持认为破折号的行数至少为 4 长,并告诉它对消耗破折号具有占有欲,以上将需要传递 theMULTILINEDOTALLflags 。

于 2013-05-06T04:26:01.417 回答