lucene - 使用边缘 NGrams 进行索引以进行预输入

Question

我试图让 Elasticsearch 索引一些文档以获取预先输入的建议。据我所知，Elasticsearch 中的边缘 NGram 处理是由下面的 Lucene 提供的。不幸的是，事实证明 Lucene 在这方面的文档对我来说很难理解。我想出的最好的方法是基于https://gist.github.com/988923，但它似乎不起作用（具有这些设置的索引仅返回完整单词的匹配项，就好像设置没有存在）：

{
  "settings":{
    "index":{
      "analysis":{
        "analyzer":{
          "typeahead_analyzer":{
            "type":"custom",
            "tokenizer":"edgeNGram",
            "filter":["typeahead_ngram"]
          }
        },
        "filter":{
          "typeahead_ngram":{
            "type":"edgeNGram",
            "min_gram":1,
            "max_gram":8,
            "side":"front"
          }
        }
      }
    }
  }
}

我真的不知道分析器、标记器和过滤器是如何结合在一起的——我什至想要一个过滤器吗？我应该只有一个标记器吗？当我为要使用的文档编制索引时，是否必须参考这些设置？如何找出 Lucene 下面对给定索引使用的设置？我该如何调试？帮助：-）

score 1 · Accepted Answer

我使用 edgeNGram 解决了这个问题。以下是我用来完成此任务的映射和分析。

{
"analysis": {
    "analyzer": {
        "str_search_analyzer": {
            "tokenizer": "standard",
            "filter": [
                "lowercase"
            ]
        },
        "str_index_analyzer": {
            "tokenizer": "standard",
            "filter": [
                "lowercase",
                "substring"
            ]
        }
    },
    "filter": {
        "substring": {
            "type": "edgeNGram",
            "min_gram": 1,
            "max_gram": 10,
            "side": "front"
        }
    }
}

}

{
"index_name": {
    "properties": {
        "location": {
            "type": "geo_point"
        },
        "name": {
            "type": "string",
            "index": "analyzed",
            "search_analyzer": "str_search_analyzer",
            "index_analyzer": "str_index_analyzer"
        }
    }
}

}

一个重要的脚注是，我需要使用带有 AND 运算符的匹配查询来正确查询。

希望这可以帮助。

lucene - 使用边缘 NGrams 进行索引以进行预输入

1 回答 1

Related

Reference