2

我的服务器上运行了 solr 4.0。一切正常,但停用词。

这是我的文本字段

 <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

这是我的 text_general 字段类型

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />

<filter class="solr.HyphenatedWordsFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" />
  </analyzer>
</fieldType>

我的停用词与这个 schema.xml 在同一个文件夹中,这是列表的一部分:

#Standard english stop words taken from Lucene's StopAnalyzer
#a - contained in English alphabet below
an
and
are
as
at
be
but
by
for
if
in
into
is
it
no
not
of
on
or

此列表中的任何单词都会在 solr 中返回结果。

这是调试的一部分:

<lst name="responseHeader">


<int name="status">0</int>
  <int name="QTime">211</int>
  <lst name="params">
    <str name="debugQuery">true</str>
    <str name="fl">id</str>
    <str name="indent">true</str>
    <str name="q">text:an</str>
    <str name="wt">xml</str>
  </lst>
</lst>
<result name="response" numFound="476462" start="0">
  <doc>
    <str name="id">5203921</str></doc>
  <doc>
    <str name="id">826470</str></doc>
  <doc>
    <str name="id">40853</str></doc>
  <doc>
    <str name="id">100821</str></doc>
  <doc>
    <str name="id">735712</str></doc>
  <doc>
    <str name="id">1826069</str></doc>
  <doc>
    <str name="id">520764</str></doc>
  <doc>
    <str name="id">1189586</str></doc>
  <doc>
    <str name="id">5203322</str></doc>
  <doc>
    <str name="id">1227851</str></doc>
</result>
<lst name="debug">
  <str name="rawquerystring">text:an</str>
  <str name="querystring">text:an</str>
  <str name="parsedquery">text:an</str>
  <str name="parsedquery_toString">text:an</str>
  <lst name="explain">
    <str name="5203921">
2.2455122 = (MATCH) weight(text:an in 5529393) [DefaultSimilarity], result of:
  2.2455122 = fieldWeight in 5529393, product of:
    1.0 = tf(freq=1.0), with freq of:
      1.0 = termFreq=1.0
    3.5928197 = idf(docFreq=476462, maxDocs=6369076)
    0.625 = fieldNorm(doc=5529393)
</str>
    <str name="826470">
1.9053802 = (MATCH) weight(text:an in 2661240) [DefaultSimilarity], result of:
  1.9053802 = fieldWeight in 2661240, product of:
    1.4142135 = tf(freq=2.0), with freq of:
      2.0 = termFreq=2.0
    3.5928197 = idf(docFreq=476462, maxDocs=6369076)
    0.375 = fieldNorm(doc=2661240)
</str>

它仍然从 solr 获得结果。我错过了什么?

4

1 回答 1

0

使用绝对路径。代替

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />

经过

<filter class="solr.StopFilterFactory" ignoreCase="true" words="/the/absolute/path/stopwords.txt" enablePositionIncrements="true" />
于 2014-07-18T07:28:16.550 回答