1

我正在使用 Elasticsearch 7.2.0,我想创建搜索建议。

例如,我有这 3 部电影的标题:

复仇者联盟:无限战争
复仇者联盟:无限战争 Part 2
复仇者联盟:奥创纪元

当我输入“ aven ”时,应该返回如下建议:

复仇者联盟
复仇者联盟无限
复仇者联盟时代

当我输入“复仇者联盟 inf

复仇者联盟无限战争
复仇者无限无限战争第2部分

经过大量的弹性搜索教程后,我做到了:

检查集群

PUT movies
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {},
        "analyzer": {
          "keyword_analyzer": {
            "filter": [
              "lowercase",
              "asciifolding",
              "trim"
            ],
            "char_filter": [],
            "type": "custom",
            "tokenizer": "keyword"
          },
          "edge_ngram_analyzer": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "edge_ngram_tokenizer"
          },
          "edge_ngram_search_analyzer": {
            "tokenizer": "lowercase"
          },
          "completion_analyzer": {
            "tokenizer": "keyword",
            "filter": "lowercase"
          }
        },
        "tokenizer": {
          "edge_ngram_tokenizer": {
            "type": "edge_ngram",
            "min_gram": 2,
            "max_gram": 5,
            "token_chars": [
              "letter"
            ]
          }
        }
      }
    }
  },
  "mappings": {

      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "keywordstring": {
              "type": "text",
              "analyzer": "keyword_analyzer"
            },
            "edgengram": {
              "type": "text",
              "analyzer": "edge_ngram_analyzer",
              "search_analyzer": "edge_ngram_search_analyzer"
            },
            "completion": {
              "type": "completion"
            }
          },
          "analyzer": "standard"
        },
        "completion_terms": {
          "type": "text",
          "fielddata": true,
          "analyzer": "completion_analyzer"
        }
      }

  }
}

以下文档:

POST movies/_doc/1
{
  "name": "Spider-Man: Homecoming",
  "completion_terms": [
    "spider-man",
    "homecomming"
  ]
}

POST movies/_doc/2
{
  "name": "Ant-man and the Wasp",
  "completion_terms": [
    "ant-man",
    "and",
    "the",
    "wasp"
  ]
}

POST movies/_doc/3
{
  "name": "Avengers: Infinity War Part 2",
  "completion_terms": [
    "avangers",
    "infinity",
    "war",
    "part",
    "2"
  ]
}

POST movies/_doc/4
{
  "name": "Captain Marvel",
  "completion_terms": [
    "captain",
    "marvel"
  ]
}

POST movies/_doc/5
{
  "name": "Black Panther",
  "completion_terms": [
    "black",
    "panther"
  ]
}

POST movies/_doc/6
{
  "name": "Avengers: Infinity War",
  "completion_terms": [
    "avangers",
    "infinity",
    "war"
  ]
}

POST movies/_doc/7
{
  "name": "Thor: Ragnarok",
  "completion_terms": [
    "thor",
    "ragnarok"
  ]
}

POST movies/_doc/8
{
  "name": "Guardians of the Galaxy Vol 2",
  "completion_terms": [
    "guardians",
    "of",
    "the",
    "galaxy",
    "vol",
    "2"
  ]
}

POST movies/_doc/9
{
  "name": "Doctor Strange",
  "completion_terms": [
    "doctor",
    "strange"
  ]
}

POST movies/_doc/10
{
  "name": "Captain America: Civil War",
  "completion_terms": [
    "captain",
    "america",
    "civil",
    "war"
  ]
}

POST movies/_doc/11
{
  "name": "Ant-Man",
  "completion_terms": [
    "ant-man"
  ]
}

POST movies/_doc/12
{
  "name": "Avengers: Age of Ultron",
  "completion_terms": [
    "avangers",
    "age",
    "of",
    "ultron"
  ]
}

POST movies/_doc/13
{
  "name": "Guardians of the Galaxy",
  "completion_terms": [
    "guardians",
    "of",
    "the",
    "galaxy"
  ]
}

POST movies/_doc/14
{
  "name": "Captain America: The Winter Soldier",
  "completion_terms": [
    "captain",
    "america",
    "the",
    "winter",
    "solider"
  ]
}

POST movies/_doc/15
{
  "name": "Thor: The Dark World",
  "completion_terms": [
    "thor",
    "the",
    "dark",
    "world"
  ]
}

POST movies/_doc/16
{
  "name": "Iron Man 3",
  "completion_terms": [
    "iron",
    "man",
    "3"
  ]
}

POST movies/_doc/17
{
  "name": "Marvel’s The Avengers",
  "completion_terms": [
    "marvels",
    "the",
    "avangers"
  ]
}

POST movies/_doc/18
{
  "name": "Captain America: The First Avenger",
  "completion_terms": [
    "captain",
    "america",
    "the",
    "first",
    "avanger"
  ]
}

POST movies/_doc/19
{
  "name": "Thor",
  "completion_terms": [
    "thor"
  ]
}

POST movies/_doc/20
{
  "name": "Iron Man 2",
  "completion_terms": [
    "iron",
    "man",
    "2"
  ]
}

POST movies/_doc/21
{
  "name": "The Incredible Hulk",
  "completion_terms": [
    "the",
    "incredible",
    "hulk"
  ]
}

POST movies/_doc/22
{
  "name": "Iron Man",
  "completion_terms": [
    "iron",
    "man"
  ]
}

和查询

POST movies/_search
{
  "suggest": {
    "movie-suggest-fuzzy": {
        "prefix": "avan",
        "completion": {
          "field": "name.completion",
          "fuzzy": {
            "fuzziness": 1
          }
      }
    }
  }
}

我的查询返回完整标题而不是碎片。

4

1 回答 1

0

这是一个很好的问题,表明你已经做了很多研究来让它工作,但你没有必要让它变得复杂(通过尝试在 ES 中完全处理它),我有完全相同的用例并使用它解决了它应用端逻辑与 ES 的结合。

您实际需要的是对 (n-1) 项的匹配查询和您提到的第 n 个搜索项的前缀查询,如果将 aven 作为其第一个和第 n 个项,则前缀查询将在其上,如果是avengers inf搜索项,avengers将在匹配查询中,前缀将在inf期限内。

我刚刚对您提供的文档进行了索引,并尝试了提到的两个搜索词并且它有效:

索引创建

{
    "mappings": {
        "properties": {
            "name": {
                "type": "text"
            }
        }
    }
}

索引 3 文档

{
  "name" : "Avengers: Age of Ultron"
},
{
  "name" : "Avengers: Infinity War Part 2"
},
{
  "name" : "Avengers: Infinity War"
}

搜索查询

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {   --> Note match queries on (n-1) terms
                        "name": "avengers"
                    }
                },
                {
                    "prefix": {  --> Prefix query on nth term
                        "name": "ag"
                    }
                }
            ]
        }
    }
}

基本上,在您的应用程序代码中,您需要根据空格拆分搜索词,然后使用 (n-1) 个词的匹配子句和第 n 个词的前缀查询构造布尔查询。

请注意,您甚至不需要在索引时使用边缘 n-gram 分析器和其他复杂的东西,这将在索引中节省大量空间,但您可能希望对前缀查询设置字符限制,因为搜索时可能会很昂贵在数百万个文档中,因为它不是令牌匹配的令牌,因为它在匹配查询中存在。

于 2020-01-13T06:14:10.340 回答