我需要一个句子解析器。解析器根据白色字符拆分完整的句子。并且它将括号内的完整内容视为一个单词(已解析的单词)。
输入句子:-
“这是很棒的工作(我真正的工作)。”
所需输出:-
This
is
the
work
(my real job)
which
is
great.
不确定是否有一种很好的方法可以使用这个正则表达式从这样的句子中解析出单词。无论如何,您可能需要遍历句子。我不认为String.split()
会为你做这件事。只需编写一个循环来为您执行此操作,然后您就可以处理括号不匹配时的细节。例如,这将假设所有内容都是一个单词,即使句子结束并且没有右括号:
String s = "This is the work (my real job) which is great, and (also some stuff";
ArrayList<String> words = new ArrayList<String>();
Scanner sentence = new Scanner(s);
boolean inParen = false;
StringBuilder inParenWord = new StringBuilder();
while(sentence.hasNext()) {
String word = sentence.next();
if(inParen) {
inParenWord.append(" ");
inParenWord.append(word);
if(word.endsWith(")")) {
words.add(inParenWord.toString());
inParenWord = new StringBuilder();
inParen = false;
}
}
else {
if(word.startsWith("(")) {
inParen = true;
inParenWord.append(word);
}
else {
words.add(word);
}
}
}
if(inParenWord.length()>0) {
words.add(inParenWord.toString());
}
for(String word : words) {
System.out.println(word);
}
这将输出:
This
is
the
work
(my real job)
which
is
great,
and
(also some stuff
或使用模式/匹配器:
String s = "This is the work (my real job) which is great, and (also somet stuff";
ArrayList<String> words = new ArrayList<String>();
Pattern p = Pattern.compile(" ?([^(][^ ]+|\\([^\\)]+\\)?)");
Matcher m = p.matcher(s);
while(m.find()) {
words.add(s.substring(m.start(),m.end()).trim());
}
for(String word : words) {
System.out.println(word);
}
我相信你需要类似的东西(虽然我不确定这个正则表达式是否能 100% 正常工作)。
简单的说;匹配(word-with-no-spaces) | (\(words-and-spaces-non-greedy\))
^[[(\w)]*|[(\(.+?)\)]*]*$