1
4

1 回答 1

3

By Using solr.HTMLStripCharFilterFactory, you could only stop the HTML tags from being "Indexed" but not from being "Stored".

In other words, you will get results for "すもももももももものうち" (Of course with HTML tags), but not for "<p>すもももももももものうち</p>".

Note: The asumption is that you dont strip off html tags during searching.

If you don't want these HTML tags to be indexed, you can use solr.PatternReplaceCharFilterFactory.

Your configuration may look like,

    <analyzer>
        <charFilter class="solr.PatternReplaceCharFilterFactory" 
                    pattern="Your regular expression to match HTML tags" 
                    replacement=""/>
        <tokenizer class="solr.CJKTokenizerFactory"/>
    </analyzer>
于 2013-03-08T10:19:51.823 回答