So I am working on a Project to extract Uppercase words out of a .doc file in Java. I am using regex, but the regex below was used by someonelse in a old .vba script. I need to find All Uppercase Words that are surrounded by Parenthesis. For Example (WORD). I know the regex below will give me a dangling meta character error so what would the regex be for this.
private static final String REGEX = "(*[A-Z]*[A-Z]*)";
private void parseWordText(File file) throws IOException {
FileInputStream fs = new FileInputStream(file);
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
if (we.getParagraphText() != null) {
String[] dataArray = we.getParagraphText();
for (int i = 0; i < dataArray.length; i++) {
String data = dataArray[i].toString();
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(data);
List<String> sequences = new Vector<String>();
while (m.find()) {
sequences.add(data.substring(m.start(), m.end()));
System.out.println(data.substring(m.start(), m.end()));
}
}
}
}
With the code above and the regex I am getting two upper case letters, not just the all upper case words with the parens.