1

在 Solr 中,当使用 solr.ShingleFilterFactory 合并令牌时,它可能会根据 min/maxShingleSize 和要合并的令牌生成多个 Shingle。因此,搜索失败。如何将多个令牌合并为一个以便我的搜索工作。这是我的设置:

<fieldType name="text_ngram" class="solr.TextField">
    <analyzer type="index">
        <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\b \b" replacement=""/>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
   </analyzer>
   <analyzer type="query">
       <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/>
       <filter class="solr.ShingleFilterFactory" tokenSeparator="" minShingleSize="2" maxShingleSize="7" outputUnigrams="false"/>
       <filter class="solr.LengthFilterFactory" min="6" max="7"/>
   </analyzer>
</fieldType>

这是查询 name_ngram 的调试输出:“our G20 9NS”

"debug": {
    "rawquerystring": "name_ngram:\"our G20 9NS\"",
    "querystring": "name_ngram:\"our G20 9NS\"",
    "parsedquery": "PhraseQuery(name_ngram:\"rg209ns g209ns\")",
    "parsedquery_toString": "name_ngram:\"rg209ns g209ns\"",
    "explain": {},

提前感谢,

4

2 回答 2

0

我面临同样的挑战,并在没有任何自定义代码的情况下像这样解决了它:

<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
<filter class="solr.FingerprintFilterFactory" separator="_" />
<filter class="solr.PatternReplaceFilterFactory" pattern="(_)" replacement="" replace="all"/>

关键是用_指纹,然后用空替换_

希望能帮助到你

于 2018-11-09T01:53:42.270 回答
0

我能够通过将同义词映射移动到 solr 配置之外来解决这个问题。我写了一些自定义代码来处理它。这是最终的架构:

<!-- Added for NGram field-->
<fieldType name="text_ngram" class="solr.TextField">
  <analyzer type="index">
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\b \b" replacement=""/>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="\b \b" replacement=""/>
  </analyzer>
</fieldType>
于 2016-02-01T15:20:44.030 回答