我不了解 Java 的正则表达式匹配 \s 的工作原理。在下面的简单类中,\s 似乎与 [至少] $ 和 * 匹配,这令人担忧。当我不包含 \s 时,每个单词的最后一个字符都会被切掉。而且,这两个正则表达式似乎都没有捕捉到字符串中的结尾 "。有人请解释发生了什么吗?或者指出一个有用的资源?谢谢。
public class SanitizeText {
public static void main(String[] args)
{
String s = "123. ... This is Evil !@#$ Wicked %^&* _ Mean ()+<> and ;:' - Nasty. \\ =\"";
String t = "123. ... This is Evil !@#$ Wicked %^&* _ Mean ()+<> and ;:' - Nasty. \\ =\"";
s = s.replaceAll(".[^\\w\\s.]", " "); // Does the \s match non-space chars? Sees like at least $ and * are matched.
s = s.replaceAll(" {2,}", " ");
t = t.replaceAll(".[^\\w.]", " "); // Why does this regex chopping the trailing char of each word ??
t = t.replaceAll(" {2,}", " ");
System.out.println ("s: " + s);
System.out.println ("t: " + t);
}
}
// produces:
// s: 123. ... This is Evil $ Wicked * _ Mean and Nasty . "
// t: 123 .. Thi i Evi Wicke Mea an Nast "