1

我正在使用 Java 并想构建两个适合两种不同场景的 reg 表达式:

1:

STARTText blah, blah
\    next line with more text, but the leading backslash
\    next line with more text, but the leading backslash
\    next line with more text, but the leading backslash

直到第一行不再以反斜杠开头。

2:

Now you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text

并且这个块以一个额外的空行结束,例如 8978。但另外我知道,带有起始数字的块将重复 10 次然后完成。

所以过滤单行是有可能的,但是如何在中间有多个换行符呢?甚至当我真的不知道何时/如何结束它时,即使是第一个块。还搜索反斜杠。所以,我的方法是有一个封闭的表达式,只有一个 - 我也可以用于 replaceAll()

4

4 回答 4

1

第一个正则表达式:

Pattern regex = Pattern.compile(
    "^          # Start of line\n" +
    "STARTText  # Match this text\n" +
    ".*\\r?\\n  # Match whatever follows on the line plus (CR)LF\n" +
    "(?:        # Match...\n" +
    " ^\\\\     # Start of line, then a backslash\n" +
    " .*\\r?\\n # Match whatever follows on the line plus (CR)LF\n" +
    ")*         # Repeat as needed", 
    Pattern.MULTILINE | Pattern.COMMENTS);

第二个正则表达式:

Pattern regex = Pattern.compile(
    "(?:        # Match...\n" +
    " ^         # Start of line\n" +
    " \\d{4}\\b # Match exactly four digits\n" +
    " .*\\r?\\n # Match whatever follows on the line plus (CR)LF\n" +
    ")+         # Repeat as needed (at least once)", 
    Pattern.MULTILINE | Pattern.COMMENTS);
于 2013-05-31T12:51:47.873 回答
1

正则表达式 1:

/^STARTText.*?(\r?\n)(?:^\\.*?\1)+/m

现场演示: http ://www.rubular.com/r/G35kIn3hQ4

正则表达式 2:

/^.*?(\r?\n)(?:^\d{4}\s.*?\1)+/m

现场演示: http ://www.rubular.com/r/TxFbBP1jLJ

编辑:

Java 演示 1:http: //ideone.com/BPNrm6

Java中的正则表达式1:

(?m)^STARTText.*?(\\r?\\n)(?:^\\\\.*?\\1)+

Java 演示 2:http: //ideone.com/TQB8Gs

Java中的正则表达式2:

(?m)^.*?(\\r?\\n)(?:^\\d{4}\\s.*?\\1)+
于 2013-05-31T12:57:00.193 回答
1

在这两种情况下,我都使用零断言前瞻,(?=^[^\\])以确保下一行继续具有我正在寻找的内容。

  • (?=启动零断言前瞻,这需要该值存在但不消耗该值
  • ^[^\\]匹配一行的开头,后跟任何字符,然后是\
  • )关闭断言

第1部分

这将匹配第 1 部分的所有文本,其中捕获的第一行后跟任意数量的带有\.

^([^\\].*?)(?=^[^\\])

正则表达式图片

在 Debuggex 上实时编辑

    Java Code Example:
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    class Module1{
      public static void main(String[] asd){
      String sourcestring = "STARTFirstText blah, blah
\    1next line with more text, but the leading backslash
\    2next line with more text, but the leading backslash
\    3next line with more text, but the leading backslash
STARTsecondText blah, blah
\    4next line with more text, but the leading backslash
\    5next line with more text, but the leading backslash
\    6next line with more text, but the leading backslash
foo";
      Pattern re = Pattern.compile("^([^\\\\].*?)(?=^[^\\\\])",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
      Matcher m = re.matcher(sourcestring);
      int mIdx = 0;
        while (m.find()){
          for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
            System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
          }
          mIdx++;
        }
      }
    }

    $matches Array:
    (
        [0] => Array
            (
                [0] => STARTFirstText blah, blah
    \    1next line with more text, but the leading backslash
    \    2next line with more text, but the leading backslash
    \    3next line with more text, but the leading backslash

                [1] => STARTsecondText blah, blah
    \    4next line with more text, but the leading backslash
    \    5next line with more text, but the leading backslash
    \    6next line with more text, but the leading backslash

            )

        [1] => Array
            (
                [0] => STARTFirstText blah, blah
    \    1next line with more text, but the leading backslash
    \    2next line with more text, but the leading backslash
    \    3next line with more text, but the leading backslash

                [1] => STARTsecondText blah, blah
    \    4next line with more text, but the leading backslash
    \    5next line with more text, but the leading backslash
    \    6next line with more text, but the leading backslash

            )

    )

第2部分

这将匹配第一行,然后是几行以数字开头的行

^([^\d].*?)(?=^[^\d])

正则表达式图片

在 Debuggex 上实时编辑

例子

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "First you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text

Second you will see the following links for the items:
2222 leading 4 digits and then some text
3333 leading 4 digits and then some text
4444 leading 4 digits and then some text";
  Pattern re = Pattern.compile("^([^\\d].*?)(?=^[^\\d])",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

$matches Array:
(
    [0] => Array
        (
            [0] => First you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text

            [1] => 

        )

    [1] => Array
        (
            [0] => First you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text

            [1] => 

        )

)
于 2013-05-31T13:41:19.313 回答
0

使用 '\' 作为反斜杠,使用 '\r\n|\r' 作为一个换行符,使用 '\d{4}' 作为 4 位数字:

.*(\r|r\n)

(你的第一个废话)

\\.*(\r|r\n)

(你的反斜杠线)

((\d{4}.*(\r|r\n))+(\r|\r\n))+

(您的 4 位数字块以空行结尾,整个以 + 重复)

于 2013-05-31T12:49:13.390 回答