1
String line = "This order was placed for QT3000! OK?";
    String pattern = "(.*)(\\d+)(.*)";

    // Create a Pattern object
    Pattern r = Pattern.compile(pattern);

    // Now create matcher object.
    Matcher m = r.matcher(line);
    if (m.find()) {
      System.out.println("Found value: " + m.group(1));
      System.out.println("Found value: " + m.group(2));
      System.out.println("Found value: " + m.group(3));
    }

输出是

Found value: This order was placed for QT300
Found value: 0
Found value: ! OK?

虽然我期待输出为

Found value: This order was placed for QT3000! OK?
Found value: 3000
Found value: This order was placed for QT3000! OK?

我预期输出的原因是

If pattern is  "(.*)"   output for m.group(1) is "This order was placed for QT3000! OK?"
If pattern is  "(\\d+)" output for m.group(1) is "3000"

我不知道什么时候提到模式为"(.*)(\\d+)(.*)"; 为什么我没有得到预期的输出?

4

2 回答 2

2

.*find 之前匹配(并消耗)尽可能多的字符\\d+。当它到达 时\\d+,只有一个数字就足够匹配了。

所以,你需要让.*懒惰:

(.*?)(\\d+)(.*)

好吧,如果您想深入了解细节,.*首先匹配整个字符串,然后一次回溯一个字符,以便正则表达式也可以匹配(\\d+)(.*)稍后出现的字符。一旦它回溯到这里的最后一个字符:

This order was placed for QT300

满足正则表达式 ( ) 的其余部分,(\\d+)(.*)因此匹配结束。

于 2013-09-07T17:22:45.313 回答
1

这是由于第一个(.*)过于贪婪并且尽可能多地吃掉,同时仍然允许(\d+)(.*)匹配字符串的其余部分。

基本上,比赛是这样进行的。一开始,第一个.*将吞噬整个字符串:

This order was placed for QT3000! OK?
                                     ^

但是,由于我们在这里找不到匹配\d+项,所以我们回溯:

This order was placed for QT3000! OK?
                                    ^
This order was placed for QT3000! OK?
                                   ^
...

This order was placed for QT3000! OK?
                               ^

在这个位置,\d+可以匹配,所以我们继续:

This order was placed for QT3000! OK?
                                ^

并将.*匹配字符串的其余部分。

这就是您看到的输出的解释。


您可以通过使第一个(.*) 惰性来解决此问题:

(.*?)(\d+)(.*)

对匹配的搜索(.*?)将从空字符串开始,随着它的回溯,它会逐渐增加它吞噬的字符数量:

This order was placed for QT3000! OK?
^
This order was placed for QT3000! OK?
 ^
...

This order was placed for QT3000! OK?
                            ^

至此,\d+可以匹配,.*也可以匹配,这样就完成了匹配尝试,输出如你所愿。

于 2013-09-07T17:23:50.823 回答