solr - Solr - 将分析器的结果写入不同的字段

Question

我已经阅读了一些教程并浏览了 Solr 文档。但有一件事我不清楚。让我解释：

假设应索引以下文档：

<doc>
  <field name="id">R12345</field>
  <field name="title">My title</field>
  <field name="content">My Content</field>
</doc>

与本文档相反，索引应包含一个名为“docType”的额外字段。这个额外的索引字段应该使用“完成规则”来填充。这背后的想法：

如果 id 以字符“R”开头，则将字符串“Resolve”写入索引中的字段 docType。如果 id 以字符“C”开头，则将字符串“Contribute”写入索引中的字段 docType。

上述文档应在索引中可用，并具有以下字段：

id=R12345
title=My Title
content=My Content
docType=Resolve

我的想法是为此使用分析器。然后分析器的结果将照常写入索引中的字段“id”（仅原始文本的副本），但结果“解决”或“贡献”应写入另一个字段。

我的基本问题是：如何在分析器（Java snipped）中实现这一点？为了使其更复杂，索引字段“docType”应该是可搜索的，并且必须在搜索结果中可用。字段 id 和 docType 的模式将如何显示？

在此先感谢托拜厄斯

score 7 · Accepted Answer

如果您只需要索引值，那么模式方法就足够了。创建一个执行必要处理的新字段类型，创建一个新类型的字段，并设置一个复制字段以从以下位置复制值id：

<fieldType name="doctypeField" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="([CR]).*" replacement="$1" replace="all" />
    <filter class="solr.PatternReplaceFilterFactory" pattern="C" replacement="Contribute" replace="all" />
    <filter class="solr.PatternReplaceFilterFactory" pattern="R" replacement="Resolve" replace="all" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<field name="doctype" type="doctypeField" indexed="true" stored="false" required="false" />

<copyField source="id" dest="doctype"/>

You might want to note that you won't get a stored value from this. If you need that, then you should have the docType value figured out before feeding the document to Solr -- for instance by creating it in the SQL-query, if your content source is SQL, etc.

solr - Solr - 将分析器的结果写入不同的字段

1 回答 1

Related

Reference