java - 从`StringTokenizer`获取原始字符串中的位置

Question

我需要在字符串中获取空格分隔的标记，但我还需要知道每个标记开始的原始字符串中的字符位置。有什么办法可以做到这一点StringTokenizer。另外，据我了解，这是一个遗留类；有没有更好的选择来使用StringTokenizer.

score 8 · Accepted Answer

您应该始终使用String#split()来拆分您的字符串，而不是StringTokenizer.

但是，由于您还想要字符串中标记的位置，那么最好使用Pattern和Matcher分类。你有Matcher#start()方法给出匹配模式的字符串的位置。

这是一个例子： -

String str = "abc asf basdfasf asf";
Matcher matcher = Pattern.compile("\\S+").matcher(str);

while (matcher.find()) {
    System.out.println(matcher.start() + ":" + matcher.group());
}

该模式\\S+匹配该字符串中的非空格字符。使用Matcher#find()方法返回所有匹配的子字符串。

score 1 · Accepted Answer

您可以自己轻松地使用String.split()

 String text = "hello world example";
 int tokenStartIndex = 0;
 for (String token : text.split(" ")) {      
   System.out.println("token: " + token + ", tokenStartIndex: " + tokenStartIndex);
   tokenStartIndex += token.length() + 1; // +1 because of whitespace
 }

这打印：

token: hello, tokenStartIndex: 0
token: world, tokenStartIndex: 6
token: example, tokenStartIndex: 12

score 0 · Accepted Answer

我改进了 micha 的答案，以便它可以处理相邻的空间：

String text = "hello  world     example";
int start = 0;
for (String token : text.split("[\u00A0 \n]")) {
    if (token.length() > 0) {
        start = text.indexOf(token, start);
        System.out.println("token: " + token + ", start at: " + start);
    }
}

输出是：

token: hello, start at: 0
token: world, start at: 7
token: example, start at: 17

java - 从`StringTokenizer`获取原始字符串中的位置

3 回答 3

Related

Reference