3

我需要在任何 " 字符处拆分 Java 字符串。主要的是,前一个字符可能不是反斜杠 ( \ )。

所以这些字符串会像这样分裂:

asdnaoe"asduwd"adfdgb         =>   asdnaoe, asduwd, adfgfb
addfgmmnp"fd asd\"das"fsfk    =>   addfgmmnp, fd asd\"das, fsfk

有没有简单的方法可以使用正则表达式来实现这一点?(我使用 RegEx 是因为它对我来说是最简单的编码器。性能也不是问题......)

先感谢您。

我是这样解决的:

    private static String[] split(String s) {
    char[] cs = s.toCharArray();

    int n = 1;

    for (int i = 0; i < cs.length; i++) {
        if (cs[i] == '"') {
            int sn = 0;

            for (int j = i - 1; j >= 0; j--) {
                if (cs[j] == '\\')
                    sn += 1;
                else
                    break;
            }

            if (sn % 2 == 0)
                n += 1;
        }
    }

    String[] result = new String[n];

    int lastBreakPos = 0;
    int index = 0;
    for (int i = 0; i < cs.length; i++) {
        if (cs[i] == '"') {
            int sn = 0;

            for (int j = i - 1; j >= 0; j--) {
                if (cs[j] == '\\')
                    sn += 1;
                else
                    break;
            }

            if (sn % 2 == 0) {
                char[] splitcs = new char[i - lastBreakPos];

                System.arraycopy(cs, lastBreakPos, splitcs, 0, i - lastBreakPos);
                lastBreakPos = i + 1;

                result[index] = new StringBuilder().append(splitcs).toString();
                index += 1;
            }
        }
    }

    char[] splitcs = new char[cs.length - (lastBreakPos + 1)];

    System.arraycopy(cs, lastBreakPos, splitcs, 0, cs.length - (lastBreakPos + 1));

    result[index] = new StringBuilder().append(splitcs).toString();

    return result;
}

无论如何,感谢您的所有精彩回复!(哦,尽管如此,我将使用@biziclop 或@Alan Moore 的版本,因为它们更短并且可能更高效!=)

4

3 回答 3

4

当然,只需使用

(?<!\\)"

快速 PowerShell 测试:

PS> 'addfgmmnp"fd asd\"das"fsfk' -split '(?<!\\)"'
addfgmmnp
fd asd\"das
fsfk

但是,这不会拆分\\"(转义的反斜杠,后跟正常的引号[至少在大多数类 C 语言的转义规则中])。但是,您无法在 Java 中真正解决这个问题,因为不支持任意长度的后视:

PS> 'addfgmmnp"fd asd\\"das"fsfk' -split '(?<!\\)"'
addfgmmnp
fd asd\\"das
fsfk

通常你会期望一个适当的解决方案来拆分剩余的",因为它并没有真正逃脱。

于 2012-05-29T19:01:53.470 回答
2

您可以使用 Java 正则表达式解决此问题;只是不要使用split().

public static void main(String[] args) throws Exception
{
  String[] strs = {
      "asdnaoe\"asduwd\"adfdgb",
      "addfgmmnp\"fd asd\\\"das\"fsfk"
  };

  for (String str : strs)
  {
    System.out.printf("%n%-28s=>  %s%n", str, splitIt(str));
  }
} 


public static List<String> splitIt(String s)
{
  ArrayList<String> result = new ArrayList<String>();
  Matcher m = Pattern.compile("([^\"\\\\]|\\\\.)+").matcher(s);
  while (m.find())
  {
    result.add(m.group());
  }
  return result;
}

输出:

asdnaoe"asduwd"adfdgb       => [asdnaoe, asduwd, adfdgb]

addfgmmnp"fd asd\"das"fsfk  => [addfgmmnp, fd asd\"das, fsfk]

The core regex, [^"\\]|\\., consumes anything that's not a backslash or a quotation mark, or a backslash followed by anything--so \\\" would be matched as an escaped backslash (\\) followed by an escaped quote (\").

于 2012-05-30T05:17:14.977 回答
1

仅供参考,这是一个处理转义的非正则表达式解决方案\。(在现实生活中,这可以被简化,实际上不需要START_NEW状态,但我尝试以更易于阅读的方式编写它。)

public class Splitter {

    private enum State {
        IN_TEXT, ESCAPING, START_NEW;
    }

    public static List<String> split( String source ) {
        LinkedList<String> ret = new LinkedList<String>();
        StringBuilder sb = new StringBuilder();
        State state = State.START_NEW;
        for( int i = 0; i < source.length(); i++ ) {
            char next = source.charAt( i );
            if( next == '\\' && state != State.ESCAPING ) {
                state = State.ESCAPING;
            } else if( next == '\\' && state == State.ESCAPING ) {
                state = State.IN_TEXT;
            } else if( next == '"' && state != State.ESCAPING ) {
                ret.add( sb.toString() );
                sb = new StringBuilder();
                state = State.START_NEW;
            } else {
                state = State.IN_TEXT;
            }
            if( state != State.START_NEW ) {
                  sb.append( next );
            }
        }
        ret.add( sb.toString() );
        return ret;
    }

}
于 2012-05-29T20:23:09.313 回答