基本上我有一堆大字符串,我想从中删除空格/标点符号/数字,我只想要单词。
这是我的代码:
String str = "hughes/conserdyne corp, unit <hughes capital corp> made bear stearns <bsc> exclusive investment banker develop market 2,188,933 financing design installation micro-utility systems municipalities. company systems self-contained electrical generating facilities alternate power sources, photovoltaic cells, replace public utility power sources.";
String[] arr = str.split("[\\p{P}\\s\\t\\n\\r<>\\d]");
for (int i = 0; i < arr.length; i++) {
if(arr[i] != null)
System.out.println(arr[i]);
}
这是我得到的输出:
hughes
conserdyne
corp
unit
lt
hughes
capital
corp
made
bear
stearns
lt
bsc
exclusive
investment
banker
develop
market
financing
design
installation
micro
utility
systems
municipalities
company
systems
self
contained
electrical
generating
facilities
alternate
power
sources
photovoltaic
cells
replace
public
utility
power
sources
正如你所看到的,有很多空格,并且出现在逗号和数字曾经的位置。如果打印条件,我会得到这个有或没有那个。
然而,如果我将 arr 的所有内容连接成一个新字符串,然后用正则表达式 "\s+" 将其拆分,它就可以工作并产生正确的输出。
那么我当前的正则表达式有什么问题?任何帮助,将不胜感激。