0

应该使用什么正则表达式来提取由它们的标题分隔的多个文本块,这些文本块也应该被解析,例如:

some text info before message sequence
============
first message header that should be parsed (may contain = character)
============
first multiline
message body that
should also be parsed
(may contain = character)
============
second message header that should be parsed
============
second multiline
message body that
should also be parsed
... and so on

我试图使用:

String regex = "^=+$\n"+
        "^(.+)$\n"+
        "^=+$\n"+
        "((?s:(?!(^=.+)).+))";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);

((?s:(?!(^=.+)).+))同样吃第二条信息。这是一个显示问题的测试:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.junit.Assert;
import org.junit.Test;
public class ParsingTest {
@Test
public void test() {
    String fstMsgHeader = "first message header that should be parsed (may contain = character)";
    String fstMsgBody = "first multiline\n"+
                        "message body that\n"+
                        "should also be parsed\n"+
                        "(may contain = character)";
    String sndMsgHeader = "second message header that should be parsed";
    String sndMsgBody = "second multiline\n"+
            "message body that\n"+
            "should also be parsed\n"+
            "... and so on";
    String sample = "some text info before message sequence\n"+
                    "============\n"+
                    fstMsgHeader+"\n"+
                    "============\n"+
                    fstMsgBody+"\n"+
                    "============\n"+
                    sndMsgHeader+"\n"+
                    "============\n"+
                    sndMsgBody +"\n";
    System.out.println(sample);
    String regex =  "^=+$\n"+
                    "^(.+)$\n"+
                    "^=+$\n"+
                    "((?s:(?!(^=.+)).+))";
    Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
    Matcher matcher = p.matcher(sample);
    int blockNumber = 1;
    while (matcher.find()) {
        System.out.println("Block "+blockNumber+": "+matcher.group(0)+"\n_________________");
        if (blockNumber == 1) {
            Assert.assertEquals(fstMsgHeader, matcher.group(1));
            Assert.assertEquals(fstMsgBody, matcher.group(2));
        } else {
            Assert.assertEquals(sndMsgHeader, matcher.group(1));
            Assert.assertEquals(sndMsgBody, matcher.group(2));
        }
    }
}

}

4

1 回答 1

1

我不确定这是否是您正在寻找的,但也许这个正则表达式会有所帮助

String regex = 
        "={12}\n" +   // twelve '=' marks and new line mark
        "(.+?)" +     // minimal match that has
        "\n={12}\n" + // new line mark with twelve '=' marks after it
        "(.+?)(?=\n={12}|$)"; // minimal match that will have new line
                              // character and twelve `=` marks after
                              // it or end of data $

并使其正常工作,您应该使 dot 也将换行符与Pattern.DOTALL标志匹配。

Pattern p = Pattern.compile(regex, Pattern.DOTALL);
于 2013-08-20T15:54:09.677 回答