0

So I am working on a Project to extract Uppercase words out of a .doc file in Java. I am using regex, but the regex below was used by someonelse in a old .vba script. I need to find All Uppercase Words that are surrounded by Parenthesis. For Example (WORD). I know the regex below will give me a dangling meta character error so what would the regex be for this.

private static final String REGEX = "(*[A-Z]*[A-Z]*)";
private void parseWordText(File file) throws IOException { 
    FileInputStream fs = new FileInputStream(file); 
    HWPFDocument doc = new HWPFDocument(fs); 
    WordExtractor we = new WordExtractor(doc); 
    if (we.getParagraphText() != null) { 
        String[] dataArray = we.getParagraphText(); 
        for (int i = 0; i < dataArray.length; i++) { 
            String data = dataArray[i].toString(); 
            Pattern p = Pattern.compile(REGEX); 
            Matcher m = p.matcher(data); 
            List<String> sequences = new Vector<String>(); 
            while (m.find()) { 
                sequences.add(data.substring(m.start(), m.end())); 
                System.out.println(data.substring(m.start(), m.end())); 
            } 
        } 
    } 
} 

With the code above and the regex I am getting two upper case letters, not just the all upper case words with the parens.

4

1 回答 1

1

括号是正则表达式中的保留字符,所以你的第一个*不是修改任何东西。至少,您需要逃避它们:

\(*[A-Z]*[A-Z]*\)

但是,不要停止阅读!需要注意的是,上面的正则表达式等同于:

\(*[A-Z]*\)

但最重要的是,我认为这不是您想要的正则表达式。我认为您正在尝试捕获用括号括起来的非零连续大写字母,或者:

\([A-Z]+\)

'+' 是一个或多个匹配项,您会注意到我已经停止重复左括号。对于加分,您可能希望在括号的开头或结尾处处理空格:

\(\s*[A-A]+\s*\)

但请注意,这将匹配新行。希望这可以帮助!

于 2012-09-06T18:48:48.250 回答