java - StringTokenizer - 如何忽略字符串中的空格

Question

我正在尝试在以下单词列表中使用 stringtokenizer

String sentence=""Name":"jon" "location":"3333 abc street" "country":"usa"" etc

当我使用 stringtokenizer 并将空格作为分隔符时，如下所示

StringTokenizer tokens=new StringTokenizer(sentence," ")

我期待我的输出为不同的标记，如下所示

Name:jon

location:3333 abc street

country:usa

但是字符串标记器也尝试标记位置的值，它看起来像

Name:jon

location:3333

abc

street

country:usa

请让我知道如何解决上述问题，如果我需要做一个正则表达式，我应该指定什么样的表达式？

score 5 · Accepted Answer

这可以使用CSV Reader轻松处理。

String str = "\"Name\":\"jon\" \"location\":\"3333 abc street\" \"country\":\"usa\"";

// prepare String for CSV parsing
CsvReader reader = CsvReader.parse(str.replaceAll("\" *: *\"", ":"));
reader.setDelimiter(' '); // use space a delimiter
reader.readRecord(); // read CSV record
for (int i=0; i<reader.getColumnCount(); i++) // loop thru columns
    System.out.printf("Scol[%d]: [%s]%n", i, reader.get(i));

更新：这是纯 Java SDK 解决方案：

Pattern p = Pattern.compile("(.+?)(\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)|$)");
Matcher m = p.matcher(str);
for (int i=0; m.find(); i++)
    System.out.printf("Scol[%d]: [%s]%n", i, m.group(1).replace("\"", ""));

输出：

Scol[0]: [Name:jon]
Scol[1]: [location:3333 abc street]
Scol[2]: [country:usa]

现场演示：http: //ideone.com/WO0NK6

说明：根据OP的评论：

我正在使用这个正则表达式：

(.+?)(\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)|$)

现在把它分解成更小的块。

PS：DQ代表双引号

(?:[^\"]*\")                    0 or more non-DQ characters followed by one DQ (RE1)
(?:[^\"]*\"){2}                 Exactly a pair of above RE1
(?:(?:[^\"]*\"){2})*            0 or more occurrences of pair of RE1
(?:(?:[^\"]*\"){2})*[^\"]*$     0 or more occurrences of pair of RE1 followed by 0 or more non-DQ characters followed by end of string (RE2)
(?=(?:(?:[^\"]*\"){2})*[^\"]*$) Positive lookahead of above RE2

.+?  Match 1 or more characters (? is for non-greedy matching)
\\s+ Should be followed by one or more spaces
(\\s+(?=RE2)|$) Should be followed by space or end of string

简而言之：这意味着匹配 1 个或更多长度的任何字符，后跟“空格或字符串结尾”。空格后面必须跟偶数个 DQ。因此，双引号外的空格将被匹配，而双引号内的空格将不被匹配（因为它们后面跟着奇数个 DQ）。

score 2 · Accepted Answer

StringTokenizer 对于这项工作来说太简单了。如果您不需要处理值内的引号，您可以试试这个正则表达式：

String s = "\"Name\":\"jon\" \"location\":\"3333 abc street\" \"country\":\"usa\"";
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(s);
while (m.find()) {
    System.out.println(m.group(1));
}

输出：

名称
jon
位置
3333 abc 街
国家
美国

这不会处理值中的内部引号 - 输出应该在哪里，例如，

姓名：弗雷德（“弗莱迪”）琼斯

score 1 · Accepted Answer

您可以使用 Json，看起来您正在使用 Json 类型的模式。做一点谷歌并尝试实现 Json。

String sentence=""Name":"jon" "location":"3333 abc street" "country":"usa"" etc

将是 Json 中的键、值对，例如名称是键，Jon 是值。位置是关键，3333 abc 街是价值。等等....

试试看。这是一个链接 http://www.mkyong.com/java/json-simple-example-read-and-write-json/

编辑：它只是一个有点傻的答案，但你可以尝试这样的事情， sentence = sentence.replaceAll("\" ", " "); StringTokenizer tokens=new StringTokenizer(sentence," ");

java - StringTokenizer - 如何忽略字符串中的空格

3 回答 3

更新：这是纯 Java SDK 解决方案：

现场演示：http: //ideone.com/WO0NK6

说明：根据OP的评论：

Related

Reference