search - 使用 SOLR WordDelimiterFilterFactory 查询包含连字符的数字不起作用？

Question

我正在尝试使用 a 配置 solr 4.0-BETA，WordDelimiterFilterFactory以便我可以查询包含连字符的数字。

字段值：添加时为“123456-1234” ssn。

查询：

“123456-1234”<- 有效（带连字符）
“1234561234”<- 无效（不带连字符）

根据文档（AFAIUI），它应该匹配，因为字段类型有generateNumberParts和catenateNumbers。

从文档中：

generateNumberParts="1" 导致生成数字子词："500-42" => "500" "42" catenateNumbers="1" 导致连接数字部分的最大运行次数："500-42" => "50042"

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

我的领域：

<fields>
     <field name="ssn" type="text_en_splitting" indexed="true" stored="false" multiValued="false" />
     <field name="ssn_exact" type="string" indexed="true" stored="true" multiValued="false" />
</fields>

<copyField source="ssn" dest="ssn_exact" />
<copyField source="ssn" dest="text" />

中的过滤器text_en_splitting：

 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

我在这里想念什么？

score 1 · Accepted Answer

我在本地模式中创建了一个类似的字段，并使用了 Solr Admin 下的分析工具。（http://localhost:8983/solr/#/collection1/analysis- 请注意，此 url 假定 solr 正在运行http://localhost:8983/并且您的索引已命名collection1- 根据需要进行修改）。

我尝试将您的值运行到索引并针对“分析字段名称/字段类型”下拉列表中选择的 text_en_splitting 进行查询。您将从结果中看到，值 1234561234 从未添加为此字段类型的索引项。

However, if you use the text_en_splitting_tight FieldType, then the behavior you want is being produced as the hypen is removed and 1234561234 is a term being added to the index. So I would switch the field type as follows and reindex and you should be set to go.

<fields>
 <field name="ssn" type="text_en_splitting_tight" indexed="true" stored="false" multiValued="false" />
 <field name="ssn_exact" type="string" indexed="true" stored="true" multiValued="false" />
</fields>

<copyField source="ssn" dest="ssn_exact" />
<copyField source="ssn" dest="text" />

search - 使用 SOLR WordDelimiterFilterFactory 查询包含连字符的数字不起作用？

1 回答 1

Related

Reference