2

在我的自动建议中,我遇到了 solr 停用词的问题。所有停用词都替换为 _ 符号。

例如,我在“deal_title”字段中有文本“简单文本”。当我尝试搜索单词“simple”时,solr 显示下一个结果“_ simple text _”,但我期望“simple text”。

有人可以解释一下为什么会这样以及如何解决吗?这是我的 schema.xml 的一部分

<fieldType class="solr.TextField" name="text_auto">
    <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
        <filter class="solr.ShingleFilterFactory" maxShingleSize="3" outputUnigrams="true" outputUnigramsIfNoShingles="false" /> 
    </analyzer> 
    <analyzer type="query">
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
        <tokenizer class="solr.StandardTokenizerFactory"/> 
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    </analyzer>
</fieldType>

<field name="deal_title" type="text_auto" indexed="true" stored="true" required="false" multiValued="false"/>

<fieldType name="text_general" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
4

2 回答 2

2

我在 Solr 6.3(enablePositionIncrements="false"不再可能)中对此的解决方案是:

  1. 删除停用词
  2. 带状疱疹fillerToken=""(去除_
  3. 删除前导和尾随间隔
  4. 删除重复项

    <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_de.txt" ignoreCase="true"/>
    <filter class="solr.ShingleFilterFactory" fillerToken=""/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="(^ | $)" replacement=""/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    

于 2017-01-10T10:22:55.523 回答
0

要解决此问题,您需要在 solconfig.xml中使用<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="false" /><luceneMatchVersion>4.3</luceneMatchVersion>

于 2015-02-12T13:44:03.860 回答