我需要使用正则表达式 = \w (或所有单词)来实现 Pattern。
当我运行程序输出应该是:
a [1]
is [1]
test[1,2]
但它是:
a [1]
e [2]
h [1]
i [1, 1]
s [1, 1, 2]
t [1, 2, 2]
负责扫描和模式匹配的代码如下:
public class DocumentIndex {
private TreeMap<String, ArrayList<Integer>> map =
new TreeMap<String, ArrayList<Integer>>(); // Stores words and their locations
private String regex = "\\w"; //any word
/**
* A constructor that scans a document for words and their locations
*/
public DocumentIndex(Scanner doc){
Pattern p = Pattern.compile(regex); //Pattern class: matches words
Integer location = 0; // the current line number
// while the document has lines
// set the Matcher to the current line
while(doc.hasNextLine()){
location++;
Matcher m = p.matcher(doc.nextLine());
// while there are value in the current line
// check to see if they are words
// and if so save them to the map
while(m.find()){
if(map.containsKey(m.group())){
map.get(m.group()).add(location);
} else {
ArrayList<Integer> list = new ArrayList<Integer>();
list.add(location);
map.put(m.group(), list);
}
}
}
}
...
}
将整个单词作为模式阅读的最佳方法是什么?