3

Given the following string:

"foo bar-baz-zzz"

I want to split it at the characters " " and "-", preserving their value, but get all combinations of inputs.

i want to get a two-dimensional array containing

{{"foo", "bar", "baz", "zzz"}
,{"foo bar", "baz", "zzz"}
,{"foo", "bar-baz", "zzz"}
,{"foo bar-baz", "zzz"}
,{"foo", "bar", "baz-zzz"}
,{"foo bar", "baz-zzz"}
,{"foo", "bar-baz-zzz"}
,{"foo bar-baz-zzz"}}

Is there any built-in method in Java to split the string this way? Maybe in a library like Apache Commons? Or do I have to write a wall of for-loops?

4

5 回答 5

6

这是一个有效的递归解决方案。我使用了一个List<List<String>>而不是二维数组来使事情变得更容易。代码有点难看,可能会稍微整理一下。

样本输出:

$ java Main foo bar-baz-zzz
Processing: foo bar-baz-zzz
[foo, bar, baz, zzz]
[foo, bar, baz-zzz]
[foo, bar-baz, zzz]
[foo, bar-baz-zzz]
[foo bar, baz, zzz]
[foo bar, baz-zzz]
[foo bar-baz, zzz]
[foo bar-baz-zzz]

代码:

import java.util.*;

public class Main {
  public static void main(String[] args) {
    // First build a single string from the command line args.
    StringBuilder sb = new StringBuilder();
    Iterator<String> it = Arrays.asList(args).iterator();
    while (it.hasNext()) {
      sb.append(it.next());

      if (it.hasNext()) {
        sb.append(' ');
      }
    }

    process(sb.toString());
  }

  protected static void process(String str) {
    System.err.println("Processing: " + str);
    List<List<String>> results = new LinkedList<List<String>>();

    // Invoke the recursive method that does the magic.
    process(str, 0, results, new LinkedList<String>(), new StringBuilder());

    for (List<String> result : results) {
      System.err.println(result);
    }
  }

  protected static void process(String str, int pos, List<List<String>> resultsSoFar, List<String> currentResult, StringBuilder sb) {
    if (pos == str.length()) {
      // Base case: Reached end of string so add buffer contents to current result
      // and add current result to resultsSoFar.
      currentResult.add(sb.toString());
      resultsSoFar.add(currentResult);
    } else {
      // Step case: Inspect character at pos and then make recursive call.
      char c = str.charAt(pos);

      if (c == ' ' || c == '-') {
        // When we encounter a ' ' or '-' we recurse twice; once where we treat
        // the character as a delimiter and once where we treat it as a 'normal'
        // character.
        List<String> copy = new LinkedList<String>(currentResult);
        copy.add(sb.toString());
        process(str, pos + 1, resultsSoFar, copy, new StringBuilder());

        sb.append(c);
        process(str, pos + 1, resultsSoFar, currentResult, sb);
      } else {
        sb.append(c);
        process(str, pos + 1, resultsSoFar, currentResult, sb);
      }
    }
  }
}
于 2009-08-25T15:40:24.807 回答
4

这是一个更短的版本,以递归风格编写。我很抱歉只能用 Python 编写它。我喜欢它的简洁;这里肯定有人可以制作Java版本。

def rec(h,t):
  if len(t)<2: return [[h+t]]
  if (t[0]!=' ' and t[0]!='-'): return rec(h+t[0], t[1:])
  return rec(h+t[0], t[1:]) + [ [h]+x for x in rec('',t[1:])]

结果:

>>> rec('',"foo bar-baz-zzz")
[['foo bar-baz-zzz'], ['foo bar-baz', 'zzz'], ['foo bar', 'baz-zzz'], ['foo bar'
, 'baz', 'zzz'], ['foo', 'bar-baz-zzz'], ['foo', 'bar-baz', 'zzz'], ['foo', 'bar
', 'baz-zzz'], ['foo', 'bar', 'baz', 'zzz']]
于 2009-08-25T16:52:20.573 回答
3

这是一个懒惰地返回拆分值列表的类:

public class Split implements Iterator<List<String>> {
  private Split kid;                 private final Pattern pattern;
  private String subsequence;        private final Matcher matcher;
  private boolean done = false;      private final String sequence;
  public Split(Pattern pattern, String sequence) {
    this.pattern = pattern;          matcher = pattern.matcher(sequence);
    this.sequence = sequence;
  }

  @Override public List<String> next() {
    if (done) { throw new IllegalStateException(); }
    while (true) {
      if (kid == null) {
        if (matcher.find()) {
          subsequence = sequence.substring(matcher.end());
          kid = new Split(pattern, sequence.substring(0, matcher.start()));
        } else { break; }
      } else {
        if (kid.hasNext()) {
          List<String> next = kid.next();
          next.add(subsequence);
          return next;
        } else { kid = null; }
      }
    }
    done = true;
    List<String> list = new ArrayList<String>();
    list.add(sequence);
    return list;
  }
  @Override public boolean hasNext() { return !done; }
  @Override public void remove() { throw new UnsupportedOperationException(); }
}

(原谅代码格式 - 这是为了避免嵌套滚动条)。

对于示例调用:

Pattern pattern = Pattern.compile(" |-");
String str = "foo bar-baz-zzz";
Split split = new Split(pattern, str);
while (split.hasNext()) {
  System.out.println(split.next());
}

...它会发出:

[foo, bar-baz-zzz]
[foo, bar, baz-zzz]
[foo bar, baz-zzz]
[foo, bar-baz, zzz]
[foo, bar, baz, zzz]
[foo bar, baz, zzz]
[foo bar-baz, zzz]
[foo bar-baz-zzz]

我想可以改进实施。

于 2009-08-25T16:43:48.720 回答
1

你为什么需要那个?

请注意,对于给定的 N 个标记字符串,您希望获得一个由 ca N*2^N 个字符串组成的数组。如果不以安全的方式完成,这(可以)消耗大量内存......

我想您可能需要遍历所有内容,对吗?如果是这样,最好创建一些类来保留原始字符串,并在每次询问时为您提供不同的分割方式。这样,您将节省大量内存并获得更好的可扩展性。

于 2009-08-25T16:26:30.313 回答
0

没有库方法。

为此,您应该通过保留分隔符来标记字符串(在您的情况下使用“-”),然后您应该将分隔符视为与二进制标志相关联并根据标志的值构建所有组合。

在您的情况下,您有 3 个分隔符:“”、“-”和“-”,因此您有 3 个二进制标志。您最终将在字符串中得到 2^3 = 8 个值。

于 2009-08-25T15:40:18.520 回答