1

我已经实现了代码来计算文本中单词的出现次数。但是,由于某种原因,我的正则表达式不被接受,并且出现以下错误: Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 12

我的代码是:

import java.util.*;

公共类 CountOccurrenceOfWords {

/**
 * @param args the command line arguments
 */
public static void main(String[] args) {
    // TODO code application logic here
    char lf = '\n';

String text = "It was the best of times, it was the worst of times," + 
lf +
"it was the age of wisdom, it was the age of foolishness," + 
lf +
"it was the epoch of belief, it was the epoch of incredulity," + 
lf +
"it was the season of Light, it was the season of Darkness," + 
lf +
"it was the spring of hope, it was the winter of despair," + 
lf +
"we had everything before us, we had nothing before us," + 
lf +
"we were all going direct to Heaven, we were all going direct" + 
lf +
"the other way--in short, the period was so far like the present" + 
lf +
"period, that some of its noisiest authorities insisted on its" + 
lf +
"being received, for good or for evil, in the superlative degree" + 
lf +
"of comparison only." + 
lf +
"There were a king with a large jaw and a queen with a plain face," + 
lf +
"on the throne of England; there were a king with a large jaw and" + 
lf +
"a queen with a fair face, on the throne of France.  In both" + 
lf +
"countries it was clearer than crystal to the lords of the State" + 
lf +
"preserves of loaves and fishes, that things in general were" + 
lf +
"settled for ever";

    TreeMap<String, Integer> map = new TreeMap<String, Integer>();
    String[] words = text.split("[\n\t\r.,;:!?(){");
    for(int i = 0; i < words.length; i++){
        String key = words[i].toLowerCase();

        if(key.length() > 0) {
            if(map.get(key) == null){
                map.put(key, 1);
            }
            else{
                int value = map.get(key);
                value++;
                map.put(key, value);
            }
        }
    }

    Set<Map.Entry<String, Integer>> entrySet = map.entrySet();

    //Get key and value from each entry
    for(Map.Entry<String, Integer> entry: entrySet){
        System.out.println(entry.getValue() + "\t" + entry.getKey());
    }
    }
}

另外,您能否提供一个关于如何按字母顺序排列单词的提示?先感谢您

4

3 回答 3

1

您错过"]"了正则表达式的末尾。

"[\n\t\r.,;:!?(){" 是不正确的。

您需要将您的正则表达式替换为"[\n\t\r.,;:!?(){]"

于 2013-10-23T08:35:20.350 回答
0

您需要转义正则表达式的特殊字符。在您的情况下,您没有逃脱(, ), [,和. 使用. 例如。您还可以考虑为空格预定义字符类- 这将匹配,等等。?.{\\[\s\r\t

于 2013-10-23T08:33:01.030 回答
0

您的问题是正则表达式中的未封闭字符类。RegEx 有一些“预定义”字符,您在查找它们时需要对其进行转义。

一个字符类是:

使用“字符类”,也称为“字符集”,您可以告诉正则表达式引擎只匹配几个字符中的一个。只需将要匹配的字符放在方括号之间。 资源

这意味着您必须转义这些字符:

\[\n\t\r\.,;:!\?\(\){

或者关闭字符类

[\n\t\r\.,;:!\?\(\){]

无论哪种方式,您都需要转义点、问号和括号。

于 2013-10-23T08:37:06.050 回答