2

I am interested to extract the first 10 digits if exists from a long string while disregarding the leading zeros. Additionally if there are only zeroes, return only 1 zero, if there no numbers, return empty string. I wish to match it in a single find.

For example:

  • "abcd00111.g2012asd" should match to "1112012"
  • "aktr0011122222222222ddd" should match to "1112222222"
  • "asdas000000asdasds0000" should match to "0"
  • "adsads.cxzv.;asdasd" should match to ""

Here is what I have tried so far: Ideone Demo - code

Pattern p = Pattern.compile("[1-9]{1}+[0-9]{9}");
Matcher m = p.matcher(str);
if (m.find()) {
  String match = m.group();
  System.out.println(match);
}

The problem is that this regex require 9 sequential digits after the first non zero, and I need any 9 digits (possible non digit chars in between).

Notice that in the code I have if (m.find()) instead of while (m.find()) because I wish to find the match in single run.

UPDATE

base on the comments i understood that it is not possible with regex to do it in single run.

I would like an answer not have to be regex based but most efficient since i will execute this method many times.

4

2 回答 2

6

在一般情况下,不可能使用单个find. 如果您知道连续数字序列的最大数量,则可以这样做,但如果不知道,则不可能,至少在 JavaPattern类的支持级别上是不可能的。我错了。Kobi 的评论表明使用单个正则表达式是可能的。我将在这里复制评论:

哦,通过捕获 10 位数字中的每一个,这在正则表达式中是可能的,例如: ^[\D0]*(\d)\D*(?:(\d)\D*(?:(\d)\D*(?:(\d)\D*(?#{6 more times}))?)?)?,但它真的很难看,并且不能很好地扩展。

不过,您仍然需要连接这些组。正则表达式开头的逻辑非常好:由于贪婪属性,它会搜索第一个非零数字,如果有的话,它会在所有前导零之后,或者如果没有非零,它将取最后一个 0 -零位。


如果你把关于效率的话题抛在脑后,而你想要简短的代码:

String digitOnly = str.replaceAll("\\D+", "");
String noLeadingZero = digitOnly.replaceFirst("^0+", "");
String result = digitOnly.isEmpty() ? "" :
                noLeadingZero.isEmpty() ? "0" : 
                noLeadingZero.substring(0, Math.min(noLeadingZero.length(), 10));

坦率地说,用 a 循环遍历字符串StringBuilder就足够了,而且它应该比正则表达式解决方案更快。

StringBuilder output = new StringBuilder();
boolean hasDigit = false;
boolean leadingZero = true;
for (int i = 0; i < str.length() && output.length() < 10; i++) {
    char currChar = str.charAt(i);
    if ('0' <= currChar && currChar <= '9') {
        hasDigit = true;
        if (currChar != '0') {
            output.append(currChar);
            leadingZero = false;
        } else if (!leadingZero) { // currChar == 0
            output.append(currChar);
        } // Ignore leading zero
    }
}

String result = !hasDigit ? "" :
                output.length() == 0 ? "0" :
                output.toString();

性能测试代码。请注意,您应该调整参数以使其类似于实际输入,以便获得良好的近似值。我怀疑循环方法比任何涉及正则表达式的方法都慢;但是,差异仅在大规模上显着。

于 2013-02-27T06:51:55.087 回答
2
String test = "sdfsd0000234.432004gr23.022";
StringBuilder sb = new StringBuilder();
for(int i=0;i<test.length();i++) {
    if(Character.isDigit(test.charAt(i))) 
        sb = sb.append(test.charAt(i));
}
String result = sb.toString();
result = result.replaceFirst("^0*", "");  //Remove leading zeros
System.out.println(result);               //Will print 23443200423022
于 2013-02-27T06:55:32.173 回答