我有一个单词列表:dog、cat、leopard。
我正在尝试在 Java 中提出一个正则表达式,以从包含任何单词的长段落中提取句子(不区分大小写)。句子以.
?
或!
有人可以帮忙吗?谢谢!
public class SentenceFinder {
public static void main(String[] args) {
String paragraph = "I have a list of words to match: dog, cat, leopard. But blackdog or catwoman shouldn't match. Dog may bark at the start! Is that meow at the end my cat? Some bonus sentence matches shouldn't hurt. My dog gets jumpy at times and behaves super excited!! My cat sees my goofy dog and thinks WTF?! Leopard likes to quote, \"I'm telling you these Lions suck bro!\" Sometimes the dog asks too, \"Cat got your tongue?!\"";
Pattern p = Pattern.compile("([A-Z][^.?!]*?)?(?<!\\w)(?i)(dog|cat|leopard)(?!\\w)[^.?!]*?[.?!]{1,2}\"?");
Matcher m = p.matcher(paragraph);
while (m.find()) {
System.out.println(m.group());
}
}
/* Output:
I have a list of words to match: dog, cat, leopard.
Dog may bark at the start!
Is that meow at the end my cat?
My dog gets jumpy at times and behaves super excited!!
My cat sees my goofy dog and thinks WTF?!
Leopard likes to quote, "I'm telling you these Lions suck bro!"
Sometimes the dog asks too, "Cat got your tongue?!"
*/
}
如果“引用?!”,则简化正则表达式 (或非正式标点符号)不是必需的:
"([A-Z][^.?!]*?)?(?<!\\w)(?i)(dog|cat|leopard)(?!\\w)[^.?!]*?[.?!]"
获取那些不以大写字母开头的句子(如果输入可能有这样的错别字):
"(?i)([a-z][^.?!]*?)?(?<!\\w)(dog|cat|leopard)(?!\\w)[^.?!]*?[.?!]"
以下假设句子以大写字母开头,并且 句子中除了结尾之外.
没有!
或。?
String str = "Hello. It's a leopard I think. How are you? It's just a dog or a cat. Are you sure?";
Pattern p = Pattern.compile("[A-Z](?i)[^.?!]*?\\b(dog|cat|leopard)\\b[^.?!]*[.?!]");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group());
}
// It's a leopard I think.
// It's just a dog or a cat.
这应该这样做。你只需要在中间填写你想要的单词。例子:
你好,我是一只狗,我喜欢做事?不要把我的软弱当仁慈。我的树皮胜过飞跃的咬伤!所以收养我而不是另一种动物。像一只猫。
火柴:
你好,我是一只狗,我喜欢做事?我的树皮胜过飞跃的咬伤!像一只猫。并这样做 (?i) 忽略大小写。我没有把它放进去,因为我真的不记得语法但是别人写的
"(?=.*?\\.)[^ .?!][^.?!]*?(dog|cat|leapord).*?[.?!]"
试试这个正则表达式
str.matches("(?i)(^|\\s+)(dog|cat|leopard)(\\s+|[.?!]$)");
(?i) 是一种特殊的结构,表示不区分大小写
. (猫|狗|豹)。(\.|\?|\!)$ 并且您应该使用 java.util.regex.Pattern 的 CASE_INSENSITIVE 选项。