好的,我认为您不应该为此使用正则表达式,但我无法抗拒投入一些。
如果这很难理解,请告诉我,我会添加一些评论......
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
private static final Pattern SENTENCE_DELIMITER =
Pattern.compile("((.+?)((?<!\\.[A-Z])(\\.\\s)(.+))?)");
public static void main(String[] args) {
String lineWithOneSentence =
"U.S. Navy, World War I";
String lineWithTwoSentences =
"U.S. Navy, World War I. U.S. Air Force, World War III.";
Matcher matcher = SENTENCE_DELIMITER.matcher(lineWithOneSentence);
if (matcher.matches()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
switch (i) {
case 0:
System.out.println("WHOLE MATCH: " + matcher.group(i));
break;
case 2:
System.out.println("FIRST SENTENCE: "+ matcher.group(i));
break;
case 5:
System.out.println("SECOND SENTENCE: " + matcher.group(i));
default:
}
}
}
matcher = SENTENCE_DELIMITER.matcher(lineWithTwoSentences);
if (matcher.matches()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
switch (i) {
case 0:
System.out.println("WHOLE MATCH: " + matcher.group(i));
break;
case 2:
System.out.println("FIRST SENTENCE: "+ matcher.group(i));
break;
case 5:
System.out.println("SECOND SENTENCE: " + matcher.group(i));
default:
}
}
}
}
}
这里的解决方法是:
- 使用组
- 对后跟空格的点使用负向后查找,以确保它们前面没有点后面跟着大写字母(如“U* .S *._”)
这有点矫枉过正,并且在某些时候可能会成为问题,即如果您的文本按照标点符号不连贯。
输出:
WHOLE MATCH: U.S. Navy, World War I
FIRST SENTENCE: U.S. Navy, World War I
SECOND SENTENCE: null
WHOLE MATCH: U.S. Navy, World War I. U.S. Air Force, World War III.
FIRST SENTENCE: U.S. Navy, World War I
SECOND SENTENCE: U.S. Air Force, World War III.