1

I have a working search where if someone searches for two separate words (like "red barn", Lucene does a great job of returning records that have "red barn", "barn red", and "red tractor next to the big brown barn". That's great, but the results do not return anything that contains "redbarn" (unless you specifically search for "redbarn", but then you don't get "red barn" records).

I'm just using the standard analyzer at the moment, but am not sure what needs to change in order to get the all records I'd like.

If it matters, I'm using the NEST client on top of ElasticSearch (which is Lucene under the hood). I've researched the various analyzers and properties available but haven't found the right combination to do this.

4

2 回答 2

3

最好的方法是编写一个分析器,将“redbarn”标记为[“red”,“barn”]。Lucene 已经为德语做到了,你可以看看DictionaryCompoundWordTokenFilter例如。

于 2012-07-16T13:52:59.243 回答
-1

标准分析器适用于大多数情况,但如果您需要详细的文本分析,则需要编写自己的分析器。

Solr 附带的WorldDelimeterFilter应该可以解决您的问题。Solr 是建立在 lucene 之上的,因此如果您使用 solr 附带的过滤器,应该不会有问题。请参见下面的示例:

public class CustomAnalyzer extends Analyzer { 
  public TokenStream tokenStream(String fieldName, Reader reader) { 
    TokenStream ts = new WhitespaceTokenizer(reader); 
    ts = new WordDelimiterFilter(ts, 1, 1, 1, 1, 1); 
    ts = new LowerCaseFilter(ts); 
    return ts; 
  } 
} 
于 2012-07-16T17:14:50.790 回答