How to remove all the punctuation except whitespaces or numbers in Java.
"\\p{Punct}|\\d", "" //THIS WORKS BUT IT REMOVES THE NUMBERS AND I DONT WANT IT TO REMOVE THE NUMBERS...
I am reading text and I need to remove punctuation.
String[] internal;
char ch = 'a';
int counter = 1;
int count;
int c;
Map<String, Set> dictionary = new HashMap<String, Set>();
BufferedReader in = new BufferedReader(new FileReader("yu.txt"));
while (in.ready()) {
internal = (((in.readLine()).replaceAll("\\p{Punct}|\\d", "")).toLowerCase()).split(" ");//this does not work in my case cause it removes numbers... and makes them whitespaces but other than that this one works I JUST dont want it to remove numbers and keep whitespaces...
for (count = 0; count < internal.length; count++) {
if (!dictionary.containsKey(internal[count])) {
dictionary.put(internal[count], new HashSet());
}
if (dictionary.get(internal[count]).size()<10)
{
dictionary.get(internal[count]).add(counter);
}
}
counter++;
}
Iterator iterator = dictionary.keySet().iterator();
while (iterator.hasNext()) {
String key = iterator.next().toString();
String value = dictionary.get(key).toString();
System.out.println(key + ": " + value );
}