我正在从 3.x 迁移到 4.x,并且我正在运行一些查询来验证一切是否像以前一样工作。然而,我发现查询“galaxy s3”给出的结果要少得多。在 3.x 中 numFound=1628,在 4.x 中 numFound=70。
这是相关的架构部分:
<fieldtype name="text_pt" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="-" replacement="IIIHYPHENIII"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="IIIHYPHENIII" replacement="-"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" preserveOriginal="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="false" words="portugueseStopWords.txt"/>
<filter class="solr.BrazilianStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="-" replacement="IIIHYPHENIII"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="IIIHYPHENIII" replacement="-"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="portugueseSynonyms.txt" expand="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" preserveOriginal="1" catenateNumbers="0" catenateAll="0" protected="protwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="false" words="portugueseStopWords.txt"/>
<filter class="solr.BrazilianStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldtype>
此查询中涉及的同义词是:
siii, s3
galaxy, galax
我的默认搜索运算符是 AND(在两个版本中,即使它在 4.x 中已被弃用),调试的输出是:
SOLR 3.x
<str name="parsedquery">+(title_search_pt:galaxy title_search_pt:galax)
+MultiPhraseQuery(title_search_pt:"(sii s3 s) 3")</str>
SOLR 4.x
<str name="parsedquery">+((title_search_pt:galaxy title_search_pt:galax)/no_coord)
+(+title_search_pt:sii +title_search_pt:s3 +title_search_pt:s +title_search_pt:3)/str>
奇怪的是它不会返回像“galaxy s3”这样的结果。这是调试查询:
所需子句不匹配 (+title_search_pt:sii +title_search_pt:s3 +title_search_pt:s +title_search_pt:3)
(NON-MATCH) 未能满足所需/禁止子句的条件,所需子句不匹配 ( title_search_pt:sii)
(NON-MATCH) 没有匹配项
(MATCH) 权重(title_search_pt:s3 in 1834535)
(MATCH) 权重(title_search_pt:s in 1834535)
(MATCH) 权重(title_search_pt:3 in 1834535)
当它应该与 s 和 s3 进行或运算时,它是如何需要的?
分析输出显示 sii 的标记位置为 2,就像它的同义词一样,如下所示:
galaxy sii 3
galax s3
s