我的服务器上运行了 solr 4.0。一切正常,但停用词。
这是我的文本字段
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
这是我的 text_general 字段类型
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.HyphenatedWordsFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" />
</analyzer>
</fieldType>
我的停用词与这个 schema.xml 在同一个文件夹中,这是列表的一部分:
#Standard english stop words taken from Lucene's StopAnalyzer
#a - contained in English alphabet below
an
and
are
as
at
be
but
by
for
if
in
into
is
it
no
not
of
on
or
此列表中的任何单词都会在 solr 中返回结果。
这是调试的一部分:
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">211</int>
<lst name="params">
<str name="debugQuery">true</str>
<str name="fl">id</str>
<str name="indent">true</str>
<str name="q">text:an</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="476462" start="0">
<doc>
<str name="id">5203921</str></doc>
<doc>
<str name="id">826470</str></doc>
<doc>
<str name="id">40853</str></doc>
<doc>
<str name="id">100821</str></doc>
<doc>
<str name="id">735712</str></doc>
<doc>
<str name="id">1826069</str></doc>
<doc>
<str name="id">520764</str></doc>
<doc>
<str name="id">1189586</str></doc>
<doc>
<str name="id">5203322</str></doc>
<doc>
<str name="id">1227851</str></doc>
</result>
<lst name="debug">
<str name="rawquerystring">text:an</str>
<str name="querystring">text:an</str>
<str name="parsedquery">text:an</str>
<str name="parsedquery_toString">text:an</str>
<lst name="explain">
<str name="5203921">
2.2455122 = (MATCH) weight(text:an in 5529393) [DefaultSimilarity], result of:
2.2455122 = fieldWeight in 5529393, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
3.5928197 = idf(docFreq=476462, maxDocs=6369076)
0.625 = fieldNorm(doc=5529393)
</str>
<str name="826470">
1.9053802 = (MATCH) weight(text:an in 2661240) [DefaultSimilarity], result of:
1.9053802 = fieldWeight in 2661240, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5928197 = idf(docFreq=476462, maxDocs=6369076)
0.375 = fieldNorm(doc=2661240)
</str>
它仍然从 solr 获得结果。我错过了什么?