5

为了澄清,我想知道如何使用正则表达式进行匹配:

ab
aabb
aaabbb 
...

我刚刚发现这适用于Perl

if ($exp =~ /^(a(?1)?b)$/)

要理解这一点,请查看字符串,就好像它是从外向内生长的,而不是从左向右生长的:

ab
a(ab)b
aa(ab)bb

(?1)是对外部括号集的引用。?对于最后一种情况(从外到内),我们需要after,什么都没有,并且?表示前面表达式的 0 或 1(所以它本质上是我们的基本情况)。

我的问题是:(?1)Java 中的等价物是什么?

4

4 回答 4

5

In general, regular expressions are limited to regular languages, which means that since they are equivalent to languages acceptable by DFAs (discrete finite automata), they can't count, as generalized counting requires an infinite number of states to represent the infinite number of possible counted values.

Since discrete != infinte, you can't really count, but you can do some limited types of matching like in your (a (something) b) example.

Discusses a few limitations of DFAs (and Regular Languages / Regular Expressions by extension) http://www.cs.washington.edu/education/courses/cse599/99sp/admin/Slides/Week2/sld012.htm

A better, but more verbose slide that expands on the limitations by going into DFAs in (still a bit high level) detail http://www.cs.princeton.edu/courses/archive/spr05/cos126/lectures/18.pdf

By the way, the inside-out expansion is a cool DFA trick to basically side-step the actual need to count by using pattern matching that happens to rebuild the mirror image of the string. It looks like counting, but will fall apart as soon as you do more interesting things, like require matching in order (as opposed to mirrored order matching).

于 2013-06-11T20:46:12.127 回答
1

您可以使用PatternMatcher类来计算两个字符之一的出现次数,然后使用{n}语法在正则表达式中强制执行此数字。

import java.util.regex.*;

class Test {
    public static void main(String[] args) {
        String  s       = "aaaabbbb";
        Pattern pattern = Pattern.compile("a");
        Matcher matcher = pattern.matcher(s);

        // count all 'a's
        int count = 0;
        while (matcher.find())
            count++;

        // now enforce count matches for a and b in your regular expression
        String rExp = String.format("a{%d}b{%d}", count, count);
        Pattern matchSameCount = Pattern.compile(rExp);

        Matcher m2 = matchSameCount.matcher(s);
        System.out.println( m2.matches()); 
        // prints true
    }
}

总的来说,这是一个多一点的工作,但这是我能想到的目前唯一可行的方法。

于 2013-06-11T20:57:23.580 回答
0

该语言不规则(见下文解释)。所以,你不能用正则表达式匹配那种语言。

您将需要一个堆栈来保存重复的单词。

我建议您阅读以下链接:

NFA:http ://en.wikipedia.org/wiki/Nondeterministic_finite_automaton 解释为什么这种语言不规则:为什么是 {a^nb^n | n >= 0} 不规则?

于 2013-06-11T20:58:35.670 回答
-5

Not being a Java developer, but plenty of experience in Perl, why not just /^a+b+$/ ?

I'm wondering if I've misunderstood your Q!

;-)

于 2013-06-11T20:45:07.763 回答