java - Java 删除字符串上的标点符号（还有“”和所有这些）维护重音字符

问问题 2017-11-18T13:45:18.380

1439 次

1 回答 1

The regex \p{Punct} only matches US-ASCII punctuation by default, unless you enable Unicode character classes. That means that your code, as written, would only remove these characters:

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

If you want to match everything the Unicode Consortium classified as punctuation, try \p{IsPunctuation} instead, which always checks Unicode character properties and matches all the punctiuation in your example (and more!).

To replace whitespace as well as punctuation, like in your example, you would use:

             
        line = line.replaceAll("\\p{IsPunctuation}|\\p{IsWhite_Space}", "");

于 2017-11-18T13:52:42.597 回答

java - Java 删除字符串上的标点符号（还有“”和所有这些）维护重音字符

1 回答 1

Related

Reference