0

我正在尝试在我的自定义分析器中对 TokenStream 应用多个过滤器。以下是代码:

public class CustomizeAnalyzer extends Analyzer {
//code omitted

@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer source = new LetterTokenizer(Version.LUCENE_44, reader);              
    TokenStream filter = new LowerCaseFilter(Version.LUCENE_44, source);                
    filter = new StopFilter(Version.LUCENE_44, filter, stopWords);                  
    return new TokenStreamComponents(source, new PorterStemFilter(source));
}                                              
}

但是,不会使用 LowerCaseFilter。我从字面上遵循这里的文档。有人可以解释一下如何使它工作吗?

非常感谢,

4

1 回答 1

7

你的问题在最后一行。您创建一个过滤器链,然后在 return 语句中通过传递将其短路new PorterStemFilter(source),这是一个直接位于标记器上的词干过滤器,而不是链中较早的过滤器。这应该是:

@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer source = new LetterTokenizer(Version.LUCENE_44, reader);              
    TokenStream filter = new LowerCaseFilter(Version.LUCENE_44, source);                
    filter = new StopFilter(Version.LUCENE_44, filter, stopWords);                  
    filter = new PorterStemFilter(filter);
    return new TokenStreamComponents(source, filter);
} 
于 2013-10-08T23:32:13.490 回答