elasticsearch - ElasticSearch starts with query for autocomplete feature

Question

I want to build an autocomplete feature using ElasticSearch and C#. But I am not getting the desired result. For demo purpose this is what I have done.

1) Created index called "names":

PUT names?pretty

2) Added 20 entries using POST command:

POST names/_doc/1
{
  "name" : "John Smith"
}

3) List of Names:

[ "John Smith", "John Smitha", "John Smithb", "John Smithc", "John Smithd", "John Smithe", "John Smithf",
  "John Smithg", "John Smithh", "John Smithi", "Smith John", "Smitha John", "Smithb John", "Smithc John",
  "Smithd John", "Smithe John", "Smithf John", "Smithg John", "Smithh John", "Smithi John",]

4) When I run a prefix query:

GET names/_search
{
  "query": {
    "prefix": {
      "name": {
        "value": "Smith"
      }
    }
  }
}

I expect to get back "Smith John", "Smitha John"... But I am getting back "John Smith", "John Smitha"...

What am I doing wrong? What do I need to change and where?

score 1 · Accepted Answer

You are defining your name field as text field which by default uses the standard analyzer and converts the tokens to lowercase. You can test this by using the analyze API of ES.

Tokens example for keyword analyzer

URL :- http://{{hostname}}:{{port}}/{{index}}/_analyze

{
  "text": "John Smith",
  "analyzer" : "keyword"
}

The output of above API

{
    "tokens": [
        {
            "token": "John Smith",
            "start_offset": 0,
            "end_offset": 10,
            "type": "word",
            "position": 0
        }
    ]
}

Notice that it's not breaking the text and storing it as it is as explained in official ES doc.

Tokens with standard analyzer

{
  "text": "Smith John",
  "analyzer" : "standard"
}

The output of the above API:

{
    "tokens": [
        {
            "token": "john",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "smith",
            "start_offset": 5,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

Now when prefix query isn't analyzed and send it as it is to ES, hence Smith notice with Capital S would be sent to ES for token matching, now with updated mapping, only documents starting with Smith will have that prefix and only these will come in search results.

Mapping

{
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "keyword"
            }
        }
    }
}

Search Query

{
    "query": {
        "prefix": {
            "name": {
                "value": "Smith"
            }
        }
    }
}

EDIT: :- ** Updated the setting based on the OP comments and based on above setting and search query, it gets only the results starts with Smith as shown in below output

{
  "took": 811,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "59977669",
        "_type": "_doc",
        "_id": "6",
        "_score": 1.0,
        "_source": {
          "name": "Smith John"
        }
      },
      {
        "_index": "59977669",
        "_type": "_doc",
        "_id": "7",
        "_score": 1.0,
        "_source": {
          "name": "Smithb John"
        }
      },
      {
        "_index": "59977669",
        "_type": "_doc",
        "_id": "8",
        "_score": 1.0,
        "_source": {
          "name": "Smithc John"
        }
      },
      {
        "_index": "59977669",
        "_type": "_doc",
        "_id": "9",
        "_score": 1.0,
        "_source": {
          "name": "Smithd John"
        }
      },
      {
        "_index": "59977669",
        "_type": "_doc",
        "_id": "10",
        "_score": 1.0,
        "_source": {
          "name": "Smithe John"
        }
      }
    ]
  }
}

score 1 · Accepted Answer

You need to run your prefix query on the name.keyword field and not on the name field.

GET names/_search
{
  "query": {
    "prefix": {
      "name.keyword": {
        "value": "Smith"
      }
    }
  }
}

The reason is that the name.keyword field is of type keyword and is not analyzed (i.e. one token John Smith is indexed) and hence you can perform and exact match query on it. The name field is of type text and is analyzed (i.e. two tokens john and smith are indexed) and hence your exact match (or prefix match) query doesn't work.

You can read more about it here

elasticsearch - ElasticSearch starts with query for autocomplete feature

2 回答 2

Tokens example for keyword analyzer

Tokens with standard analyzer

Mapping

Search Query

Related

Reference