postgresql - Elasticsearch search on phone numbers

Question

I have postgres array column which I wanted to be indexed and then use it in search. Here is example below,

phones = [ "+175 (2) 123-25-32", "123456789", "+12 111-111-11" ]

I have analyzed the tokens using analyze api, elasticsearch is tokenizing the field into multiple fields as follow

curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "standard",
  "text" : [ "+175 (2) 123-25-32", "123456789", "+12 111-111-11" ]
}'


{
  "tokens": [
    {
      "token": "analyzer",
      "start_offset": 6,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "standard",
      "start_offset": 19,
      "end_offset": 27,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "text",
      "start_offset": 33,
      "end_offset": 37,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "175",
      "start_offset": 45,
      "end_offset": 48,
      "type": "<NUM>",
      "position": 4
    },
    {
      "token": "2",
      "start_offset": 50,
      "end_offset": 51,
      "type": "<NUM>",
      "position": 5
    },
    {
      "token": "123",
      "start_offset": 53,
      "end_offset": 56,
      "type": "<NUM>",
      "position": 6
    },
    {
      "token": "25",
      "start_offset": 57,
      "end_offset": 59,
      "type": "<NUM>",
      "position": 7
    },
    {
      "token": "32",
      "start_offset": 60,
      "end_offset": 62,
      "type": "<NUM>",
      "position": 8
    },
    {
      "token": "123456789",
      "start_offset": 66,
      "end_offset": 75,
      "type": "<NUM>",
      "position": 9
    },
    {
      "token": "12",
      "start_offset": 80,
      "end_offset": 82,
      "type": "<NUM>",
      "position": 10
    },
    {
      "token": "111",
      "start_offset": 83,
      "end_offset": 86,
      "type": "<NUM>",
      "position": 11
    },
    {
      "token": "111",
      "start_offset": 87,
      "end_offset": 90,
      "type": "<NUM>",
      "position": 12
    },
    {
      "token": "11",
      "start_offset": 91,
      "end_offset": 93,
      "type": "<NUM>",
      "position": 13
    }
  ]
}

I wanted elasticsearch either to not do the tokenization and store the numbers without special characters e.g "+175 (2) 123-25-32" to be converted into "+17521232532" OR simply index the number as it is so that It would be available in search result.

My mapping is as below,

{ :id => { :type => "string"}, :secondary_phones => { :type => "string" } }

Here is how I am trying todo the query

      settings = {
        query: {
          filtered: {
            filter: {
              bool: {
                should: [
                  { terms: { phones: [ "+175 (2) 123-25-32", "123456789", "+12 111-111-11" ] } },
                ]
              }
            }
          }
        },
        size: 100,
      }

P.S I have also tried by removing the special characters but no luck.

I am sure it is achievable and I am missing something. Suggestions please.

Thanks.

score 0 · Accepted Answer

如果您只想对数据执行完全匹配，如terms查询示例中所示，最好的方法是简单地将index映射中的映射参数设置为not_analyzed. 看看这里的文档。

这会完全禁用对值的分析（或标记化），并将字段的内容（数组中的每个项目）视为单个标记/关键字。

postgresql - Elasticsearch search on phone numbers

1 回答 1

Related

Reference