1

I want to parse a multiline text, so I wrote something like this:

String text = "[timestamp1] INFO - Message1 \r\n"
            + "[timestamp2] ERROR - Message2 \r\n"
            + "[timestamp3] INFO - Message3 \r\n"
            + "Message3_details1......... \r\n"
            + "Message3_details2 ......... \r\n";
String regex = "\\[(.*)\\] (.*) - (.*)";
Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println("G1: " + m.group(1));
    System.out.println("G2: " + m.group(2));
    System.out.println("G3: " + m.group(3));
    System.out.println();
}

What I want to get is this:

G1: timestamp1
G2: INFO
G3: message1

G1: timestamp2
G2: ERROR
G3: message2

G1: timestamp3
G2: INFO
G3: message3
    message_details1....
    message_details2...

But what I get is like this:

G1: timestamp1] INFO - Message1
    [timestamp2] ERROR - Message2
    [timestamp3
G2: INFO
G3: Message3
    Message3_details1........
    Message3_details2........

I'm not able to solve that even with Google's help.

4

2 回答 2

4

您在正则表达式中使用了贪婪量词。因此,.*in[(.*)]将消耗所有内容,直到最后找到]。您需要使用不情愿的量词。?.*. _

此外,对于 last .*,您需要使用前瞻,使其在 next 之前停止[

以下代码将起作用:

String text = "[timestamp1] INFO - Message1 \r\n"
            + "[timestamp2] ERROR - Message2 \r\n"
            + "[timestamp3] INFO - Message3 \r\n"
            + "Message3_details1......... \r\n"
            + "Message3_details2 ......... \r\n";

String regex = "\\[(.*?)\\] (.*?) - (.*?)(?=\\[|$)";

Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
    System.out.println("G1: " + m.group(1));
    System.out.println("G2: " + m.group(2));
    System.out.println("G3: " + m.group(3));
    System.out.println();
}

正则表达式的最后一部分 -(.*?)(?=\\[|$)匹配直到[下一行的所有内容,或者直到最后 ( $)。$需要在最后一场比赛的第 3 组中捕获最后两行。

输出:

G1: timestamp1
G2: INFO
G3: Message1 


G1: timestamp2
G2: ERROR
G3: Message2 


G1: timestamp3
G2: INFO
G3: Message3 
Message3_details1......... 
Message3_details2 ......... 
于 2013-10-07T10:39:41.567 回答
0

尝试"\\[(.*?)\\] (.*?) - (.*?) \\r\\n"

于 2013-10-07T10:52:51.600 回答