我一直在使用下面的代码来尝试从我提供的文本中提取不同的部分。
它应该挑选出数字,然后将任何用[
大括号或"
引号括起来的部分放入组中。这是代码。
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Launcher2 {
/**
* @param args
*/
public static void main(String[] args) {
PrintRegexes("100.000[$₮-45]");
}
public static void PrintRegexes(String textToMatch){
Pattern p = Pattern.compile("(\\[.*?\\]|\".*?\")?.*?(\\d{1,3}(?:,\\d{3})*?(?:\\.\\d+)?).*?(\\[.*?\\]|\".*?\")",Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(textToMatch);
if (m.find())
{
for(int groups =0;groups<m.groupCount();groups++){
System.out.println("Group "+groups+" contains "+m.group(groups));
}
for(int groups =0;m.find(groups);groups++){ //this will error, but right now, it's the least of my concerns
System.out.println("Group "+groups+" contains "+m.group(groups));
}
}
}
}
Group 0 contains 100.000[$₮-45]
Group 1 contains null
Group 2 contains 100.000
Group 3 contains [$₮-45]
Group 0 contains 100.000[$₮-45]
Group 1 contains null
Group 2 contains 0.000
Group 3 contains [$₮-45]
Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 4 //don't care about this, I've got bigger strings(fish) to regex(fry) at the moment!
at java.util.regex.Matcher.group(Unknown Source)
at Launcher2.PrintRegexes(Launcher2.java:21)
at Launcher2.main(Launcher2.java:10)
所有组都相同,除了group 2
一个打印为0.000
,一个打印为100.000
。
为什么是这样?
如果我在数字前面和后面有一些东西,这种行为就会消失。
如果我只是在前面放一些东西,我会得到这个输出:
Group 0 contains [$₮-45]100.000
Group 1 contains [$₮-45]
Group 2 contains 100.000
Group 3 contains null
Group 0 contains [$₮-45]100.000
Group 1 contains null
Group 2 contains 45
Group 3 contains null
甚至更奇怪!(对我来说)最奇怪的部分是它可以在 www.debuggex.com 上运行。
我写错了我的模式吗?还是 matcher 没有计算出这个方法何时Matcher m = p.matcher(textToMatch);
构造它的组,这会影响它的行为?