solr - 如果与查询字符串的 50% 匹配，如何查询 Solr 以获取文档？

Question

我使用的是 Solr 7.6，文档结构如下：

{
    "source_ln":"en",
    "source_text":"the sky is blue",
    "target_ln":"hi",
    "target_text":"आसमान नीला है",
},
{
    "source_ln":"en",
    "source_text":"the sky is also called the celestial sphere",
    "target_ln":"hi",
    "target_text":"आकाश को आकाशीय क्षेत्र भी कहा जाता है",
}

所有字段都使用 StandardTokenizerFactory 标记器定义。

当我查询“source_text”：“天空”时，

结果集应仅包含第一个文档。

在第二个文档中，字段 "source_text":"the sky is also called the celestial sphere" 包含 8 个术语，而查询字段 "source_text":"the sky" 仅包含 2 个术语，因此至少 50% 的匹配条件是未完成，因此第二个文档不会在结果集中。

有没有办法让文档匹配至少 50% 的查询字段术语/标记？

提前致谢。

score 1 · Accepted Answer

您可以将请求处理程序设置为使用(e)dismax查询解析器，例如使用defType参数eg。?q=...&defType=dismax.

使用 dismax 解析器，您可以根据需要使用mm(Minimum Should Match)参数，只需设置mm=50%.

score 0 · Accepted Answer

您可以通过执行以下步骤来实现这些功能。

在您的模式名称“source_text_fifty”中创建单独的字段，param（indexing=true， storage=false，并且不要应用 StandardTokenizerFactory 语法类型或更好地使用 solr.KeywordTokenizerFactory 创建单独的数据类型字段）。
现在，在索引文档期间计算 50% 的输入并将这些计算数据存储在“source_text_fifty”字段中。
使用上述逻辑重新索引所有现有数据。
使用 source_text_fifty:"the sky" 运行查询。现在你只有一个 50% 的匹配数据。

solr - 如果与查询字符串的 50% 匹配，如何查询 Solr 以获取文档？

2 回答 2

Related

Reference