如果我理解正确,您正在寻找由双引号 (") 分隔的子字符串。您可以在正则表达式中使用捕获组:
String text = "Vulcans are a humanoid species in the fictional \"Star Trek\"" +
" universe who evolved on the planet Vulcan and are noted for their " +
"attempt to live by reason and logic with no interference from emotion" +
" They were the first extraterrestrial species officially to make first" +
" contact with Humans and later became one of the founding members of the" +
" \"United Federation of Planets\"";
String[] entities = new String[10]; // An array to hold matched substrings
Pattern pattern = Pattern.compile("[\"](.*?)[\"]"); // The regex pattern to use
Matcher matcher = pattern.matcher(text); // The matcher - our text - to run the regex on
int startFrom = text.indexOf('"'); // The index position of the first " character
int endAt = text.lastIndexOf('"'); // The index position of the last " character
int count = 0; // An index for the array of matches
while (startFrom <= endAt) { // startFrom will be changed to the index position of the end of the last match
matcher.find(startFrom); // Run the regex find() method, starting at the first " character
entities[count++] = matcher.group(1); // Add the match to the array, without its " marks
startFrom = matcher.end(); // Update the startFrom index position to the end of the matched region
}
或者用字符串函数编写一个“解析器”:
int startFrom = text.indexOf('"'); // The index-position of the first " character
int nextQuote = text.indexOf('"', startFrom+1); // The index-position of the next " character
int count = 0; // An index for the array of matches
while (startFrom > -1) { // Keep looping as long as there is another " character (if there isn't, or if it's index is negative, the value of startFrom will be less-than-or-equal-to -1)
entities[count++] = text.substring(startFrom+1, nextQuote); // Retrieve the substring and add it to the array
startFrom = text.indexOf('"', nextQuote+1); // Find the next " character after nextQuote
nextQuote = text.indexOf('"', startFrom+1); // Find the next " character after that
}
在这两种情况下,为了示例,示例文本都是硬编码的,并且假定存在相同的变量(名为 的字符串变量text
)。
如果要测试entities
数组的内容:
int i = 0;
while (i < count) {
System.out.println(entities[i]);
i++;
}
我必须警告你,边界/边界情况可能存在问题(即,当 " 字符位于字符串的开头或结尾时。如果 " 字符的奇偶校验不均匀(即如果有是文本中奇数个 " 字符)。您可以事先使用简单的奇偶校验:
static int countQuoteChars(String text) {
int nextQuote = text.indexOf('"'); // Find the first " character
int count = 0; // A counter for " characters found
while (nextQuote != -1) { // While there is another " character ahead
count++; // Increase the count by 1
nextQuote = text.indexOf('"', nextQuote+1); // Find the next " character
}
return count; // Return the result
}
static boolean quoteCharacterParity(int numQuotes) {
if (numQuotes % 2 == 0) { // If the number of " characters modulo 2 is 0
return true; // Return true for even
}
return false; // Otherwise return false
}
请注意,如果numQuotes
碰巧0
这个方法仍然返回true
(因为 0 模任何数字都是 0,所以(count % 2 == 0)
将是true
)尽管如果没有 " 字符,你不想继续解析,所以你想检查这种情况某处。
希望这可以帮助!