0

我有一个问题,我真的不知道我该怎么办......

这很简单,我在 SORL 中创建了 2 个索引:

“Scholastic Reader, Level 2 >” “Scholastic Reader, Level 3 >”

(符号 > 到字符串的末尾)

搜索 1:当我通过“Scholastic Reader, Level”搜索时,服务返回两个索引,这很好。

XML 响应:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">2</int>
        <lst name="params">
            <str name="indent">on</str>
            <str name="start">0</str>
            <str name="q">type:masterseries AND title:("Scholastic Reader, Level")</str>
            <str name="version">2.2</str>
            <str name="rows">10</str>
        </lst>
    </lst>
    <result name="response" numFound="2" start="0">
        <doc>
            <str name="id">118</str>
            <arr name="title">
                <str>Scholastic Reader, Level 2 ></str>
            </arr>
            <str name="type">masterseries</str>
            <str name="uuid">3bf5b10c-a286-4ad0-9c63-bb402f57a7ed</str>
        </doc>
        <doc>
            <str name="id">118</str>
            <arr name="title">
                <str>Scholastic Reader, Level 3 ></str>
            </arr>
            <str name="type">masterseries</str>
            <str name="uuid">cdb19c28-0988-4375-acf0-916bc6664055</str>
        </doc>
    </result>
</response>

搜索 2:通过“Scholastic Reader, Level 3”搜索,将返回“Scholastic Reader, Level 3 >” 太棒了!

查询字符串:type:masterseries AND title:("Scholastic Reader, Level 3") XML 响应:

    <response>
    <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">2</int>
    <lst name="params">
        <str name="indent">on</str>
        <str name="start">0</str>
        <str name="q">type:masterseries AND title:("Scholastic Reader, Level 3")</str>
        <str name="version">2.2</str>
        <str name="rows">10</str>
    </lst>
    </lst>
    <result name="response" numFound="1" start="0">
        <doc>
            <str name="id">118</str>
            <arr name="title">
                <str>Scholastic Reader, Level 3 ></str>
            </arr>
            <str name="type">masterseries</str>
            <str name="uuid">cdb19c28-0988-4375-acf0-916bc6664055</str>
        </doc>
    </result>
</response>

但是奇怪的事情来了

搜索 3:按“Scholastic Reader, Level 2”搜索,甚至精确字符串“Scholastic Reader, Level 2 >”返回“NOTHING”

查询字符串:type:masterseries AND title:("Scholastic Reader, Level 2") XML RESPONSE:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">2</int>
        <lst name="params">
            <str name="indent">on</str>
            <str name="start">0</str>
            <str name="q">type:masterseries AND title:("Scholastic Reader, Level 2")</str>
            <str name="version">2.2</str>
            <str name="rows">10</str>
        </lst>
    </lst>
<result name="response" numFound="0" start="0"/>
</response>

即使我用 1、4、5、6 之类的数字创建了索引,它也可以工作,但是级别为“2”的字符串不起作用。

谢谢你的帮助。

更新:

在 schema.xml 文件中添加一些配置:

 <fieldType name="text_en" class="solr.TextField"
        positionIncrementGap="100">
        <analyzer type="index">
            <charFilter class="solr.HTMLStripCharFilterFactory" />
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.ISOLatin1AccentFilterFactory" />
            <filter class="solr.StopFilterFactory"
                ignoreCase="true" words="lang/stopwords_en.txt"
                enablePositionIncrements="false" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.EnglishPossessiveFilterFactory" />
            <filter class="solr.KeywordMarkerFilterFactory"
                protected="protwords.txt" />
            <filter class="solr.PorterStemFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <charFilter class="solr.HTMLStripCharFilterFactory" />            
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.SynonymFilterFactory"
                synonyms="synonyms.txt" ignoreCase="true" expand="true" />
            <filter class="solr.StopFilterFactory"
                ignoreCase="true" words="lang/stopwords_en.txt"
                enablePositionIncrements="false" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.ISOLatin1AccentFilterFactory" />
            <filter class="solr.EnglishPossessiveFilterFactory" />
            <filter class="solr.KeywordMarkerFilterFactory"
                protected="protwords.txt" />            
            <filter class="solr.PorterStemFilterFactory" />
        </analyzer>
    </fieldType>
4

1 回答 1

2

我敢打赌你的问题出在:

<filter class="solr.SynonymFilterFactory"
            synonyms="synonyms.txt" ignoreCase="true" expand="true" />

看一下“synonyms.txt”,我猜你会发现一个用“too”替换“2”的条目(如果它是“to”,那么它将被 StopFilter 删除,你永远不会注意到区别)。由于expand=true,这将导致如下查询:

"Scholastic Reader Level 2 too"

这对于一组未引用的TermQuerys 很好,但不适用于PhraseQuery. 要解决此问题,您可以将 SynonymFilter 合并到您的"index"分析器中

我可以看到的其他可能性是在andISOLatin1AccentFilterFactory之后发生了一些奇怪的事情,因为应用过滤器的顺序可能会导致不同的输出,但我非常怀疑这是问题所在。StopFilterLowerCaseFilter

于 2013-01-30T19:29:59.623 回答