0

在 SOLR Apache 3.6 中搜索选择了突出显示的 USC 时,为什么它在突出显示的结果中也没有选择 USC?

字段类型如下:

 <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer type="index">
    <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
   <charFilter class="solr.HTMLStripCharFilterFactory"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

我希望 SOLR 在突出显示的搜索结果中返回 USC 和 USC。

然而它只返回南加州大学:

<response><lst name="responseHeader"><int name="status">0</int><int name="QTime">7</int><lst name="params"><str name="explainOther"/><str name="fl">*,score</str><str name="indent">on</str><str name="start">0</str><str name="q">USC</str><str name="hl.fl">*</str><str name="wt"/><str name="fq"/><str name="hl">on</str><str name="version">2.2</str><str name="rows">10</str></lst></lst><result name="response" numFound="1" start="0" maxScore="0.047945753"><doc><float name="score">0.047945753</float><str name="id">978-064172344522</str><arr name="title"><str>my <a href="www.foo.bar">link</a>  power-shot PowerShot USC Utility <br>hello</br> Rejections Under 35 U.S.C. 101 and 35 U.S.C. 112, First Paragraph Petitions to correct inventorship of an issued patent are decided by the <Underline>Supervisory Patent Examiner</Underline>, as set forth</str></arr></doc></result><lst name="highlighting"><lst name="978-064172344522"><arr name="title"><str>my <a href="www.foo.bar">link</a>  power-shot PowerShot <em>USC</em> Utility <br>hello</br> Rejections Under</str></arr></lst></lst></response>
4

1 回答 1

0

如果您转到 Solr 中的分析页面,并在 fieldType 上运行字符串“USC”,text_en_splitting您将看到它被索引为三个单独的标记:usc。使用 WordDelimiterFilterFactory 的属性(可能是 catenateAll 属性),看看是否可以将其作为usc(一个标记)而不是三个拆分标记来索引。如果这不起作用,也许您必须扩展标记器以适应您的情况。

于 2012-06-15T21:28:18.457 回答