我在使用 Java 中为给定文档创建单词到频率图的程序时遇到问题。当我打印出所有单词时,我仍然将“”视为“单词”。
这是转述的代码:
String delimiters = "[^a-zA-Z0-9]+";
String[] words;
SortedSet<String> allWords = new TreeSet<String>();
Map<String, Map<String, Integer>> wordMap = new HashMap<String, Map<String, Integer>>();
while ((line = bufferedReader.readLine()) != null) {
words = line.split(delimiters);
for all words add the word to the allWords set and the wordMap
}
for (String word : allWords) {
System.out.println(word + " : " + wordMap.get(word).entrySet());
}
这是一些示例输出:
Time elapsed: 0.75 seconds.
: [books/dickens.txt=7] // WHAT ARE YOU?!?! How does this happen??!?!
10 : [books/dickens.txt=2]
11th : [books/dickens.txt=2]
12th : [books/dickens.txt=2]
这个空白是如何出现的?谢谢
ps 如果你想看完整的代码,这里有一个链接