1

I have the following pattern:

Pattern TAG = Pattern.compile("(<[\\w]+]>)|(</[\\w]+]>)");

Please note the | in the pattern.

And I have a method that does some processing with this pattern

private String format(String s){
    Matcher m = TAG.matcher(s);
    StringBuffer sb = new StringBuffer();

    while(m.find()){
        //This is where I need to find out what part
        //of | (or) matched in the pattern
        // to perform additional processing


    }
    return sb.toString();
}

I would like to perform different functions depending on what part of the OR matched in the regex. I know that I can break up the pattern into 2 different patterns and match on each but that is not the solution I am looking for because my actual regex is much more complex and the functionality I am trying to accomplish would work best if I can do it in a single loop & regex. So my question is that:

Is there a way in java for finding out which part of the OR matched in the regex?

EDIT I am also aware of the m.group() functionality. It does not work for my case. The example below prints out <TAG> and </TAG> So for the first iteration of the loop it matches on <[\\w]+> and second iteration it matches on </[\\w]+>. However I need to know which part matched on each iteration.

static Pattern u = Pattern.compile("<[\\w]+>|</[\\w]+>");

public static void main(String[] args) {
String xml = "<TAG>044453</TAG>";

Matcher m = u.matcher(xml);

while (m.find()) {
    System.out.println(m.group(0));
}
}
4

3 回答 3

1

看看上的group()方法Matcher,你可以这样做:

if (m.group(1) != null) {
    // The first grouped parenthesized section matched
}
else if (m.group(2) != null) {
    // The second grouped parenthesized section matched
}

编辑:恢复为原始组号 - 不需要额外的括号。这应该适用于如下模式:

static Pattern TAG = Pattern.compile("(<[\\w]+>)|(</[\\w]+>)");
于 2013-06-26T19:48:00.827 回答
0

你不必使用[]with\\w因为它已经是一个类。您也可以用括号将 OR 部分的每个选项括起来,以便能够将它们用作组(如果找不到其中一个组,它将具有空引用)。所以你的代码看起来像这样:

static Pattern u = Pattern.compile("(<\\w+>)|(</\\w+>)");

public static void main(String[] args) {
    String xml = "<TAG>044453</TAG>";

    Matcher m = u.matcher(xml);

    while (m.find()) {
        if (m.group(1)!=null){// <- group 1 (<\\w+>)
            System.out.println("I found <...> tag: "+m.group(0));
        }else{ // if it wasn't (<\\w+>) then it means it had to be (</\\w+>) that was mathced 
            System.out.println("I found </...> tag: "+m.group(0));
        }
    }
}

您还可以稍微更改模式以<(/?)\\w+>使/部分可选并将其放在括号中(在这种情况下将使其成为第 1 组)。这样,如果标签没有,/那么组 1 将仅包含空字符串"",因此您可以将逻辑更改为类似

        if ("".equals(m.group(1))) {// 
            System.out.println("I found <...> tag: " + m.group(0));
        } else { 
            System.out.println("I found </...> tag: " + m.group(0));
        }
于 2013-06-26T20:45:10.207 回答
0

您应该通过分解常见部分来重写您的模式:

xy|xz  => x(y|z)
xy|x   => xy?
yx|x   => y?x

然后,通过将有趣的部分y?放在括号中,您可以使用 group() 检查它是否已设置。

于 2013-06-26T20:41:18.647 回答