0

根据corporatezen.com/2013/11/updating-solr-engine-coldfusion,我正在使用应该使用Solr 3.4的CF10。我添加<charFilter class="solr.HTMLStripCharFilterFactory"/>了,<fieldType name="text">但搜索结果中的摘要字段仍然包含 HTML。知道为什么吗?

<field name="summary" type="text" indexed="false" stored="true" required="false" />

http://localhost:8985/solr/test/admin/schema.jsp显示:

字段:摘要字段类型:TEXT

属性:标记化、存储

模式:标记化,存储

位置增量差距:100

指数分析器:org.apache.solr.analysis.TokenizerChain 详情

字符过滤器:

org.apache.solr.analysis.HTMLStripCharFilterFactory args:{luceneMatchVersion: LUCENE_24 } Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory

过滤器:

org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true luceneMatchVersion: LUCENE_24 } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1 luceneMatchVersion: LUCENE_24 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 } org.apache.solr.analysis.LowerCaseFilterFactory args:{luceneMatchVersion: LUCENE_24} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt luceneMatchVersion: LUCENE_24 } org. apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{luceneMatchVersion: LUCENE_24 } 查询分析器:org.apache.solr.analysis.TokenizerChain 详情

字符过滤器:

org.apache.solr.analysis.HTMLStripCharFilterFactory args:{luceneMatchVersion: LUCENE_24 } Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory

过滤器:

org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand: true ignoreCase: true luceneMatchVersion: LUCENE_24 } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true luceneMatchVersion: LUCENE_24 } org.apache.solr.analysis.WordDelimiterFilterFactory 参数:{splitOnCaseChange:1 generateNumberParts:1 catenateWords:0 luceneMatchVersion:LUCENE_24 generateWordParts:1 catenateAll:0 catenateNumbers:0} org.apache.solr.analysis.LowerCaseFilterFactory 参数:{luceneMatchVersion:LUCENE_24 } org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt luceneMatchVersion: LUCENE_24 } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{luceneMatchVersion: LUCENE_24 }

4

1 回答 1

4

您需要区分存储的和索引的。您添加到该字段的过滤器将更改存储在 Solr 索引中的标记以供搜索,但不会更改存储的值以供显示。

Solr 保留两个版本的字段*。一种是存储的。这是文本的原始部分,在您的情况下包含HTML。另一种是索引版本。在那里,文本分析的全部魔力已被应用。

然后,当您执行搜索时,索引用于查找哪些文档创建了匹配项。显示结果时,存储的版本会呈现给您。


* 当然,仅在您打开stored="true"indexed="true".

于 2015-02-24T08:18:19.827 回答