我有以下文本(字符串):
System.out.println(text)
..............
BLOOMINGTON, IL 61710
Page 4 of 5
8/2/2009file://C:\hjO Fhjes\hShjort_2012w211231_0323212_575.htm
Location: EAST JEFRYN, NY
..............
我需要摆脱以单词开头"Page"
并以结尾的任何子字符串".htm"
我尝试了以下方法:
Pattern patternP = Pattern.compile("(?:Page.*?)(\\n+)+htm", Pattern.DOTALL);
Matcher matcherP = patternP.matcher(filtered);
matcherP.find();
String page = matcherP.group();
text = text.replace(page, "");
但这并没有过滤,我认为是因为转义字符。我该如何改进它?