前面的示例将词干应用于搜索查询,因此如果您对全文词干感兴趣,可以尝试以下操作:
import java.io.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.tokenattributes.*;
import org.apache.lucene.analysis.snowball.*;
import org.apache.lucene.util.*;
...
public class Stemmer{
public static String Stem(String text, String language){
StringBuffer result = new StringBuffer();
if (text!=null && text.trim().length()>0){
StringReader tReader = new StringReader(text);
Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_35,language);
TokenStream tStream = analyzer.tokenStream("contents", tReader);
TermAttribute term = tStream.addAttribute(TermAttribute.class);
try {
while (tStream.incrementToken()){
result.append(term.term());
result.append(" ");
}
} catch (IOException ioe){
System.out.println("Error: "+ioe.getMessage());
}
}
// If, for some reason, the stemming did not happen, return the original text
if (result.length()==0)
result.append(text);
return result.toString().trim();
}
public static void main (String[] args){
Stemmer.Stem("Michele Bachmann amenities pressed her allegations that the former head of her Iowa presidential bid was bribed by the campaign of rival Ron Paul to endorse him, even as one of her own aides denied the charge.", "English");
}
}
TermAttribute 类已被弃用,Lucene 4 将不再支持,但文档并不清楚在其位置使用什么。
同样在第一个示例中,PorterStemmer 不能作为类(隐藏)使用,因此您不能直接使用它。
希望这可以帮助。