1

我在 elasticsearch indexproducts_idx1和 type中添加了 15k 条记录product

在记录产品名称apple iphone 6时,当我搜索iphone6它时,它会返回空数据。

这是我在 php elasticsearch 中的代码

<?php

    use Elasticsearch\ClientBuilder;

    require 'vendor/autoload.php';

   $client = ClientBuilder::create()->build();
 $values =['name','name.prefix','name.suffix','sku'];
$params =
[
'client'=>['verify'=>1,'connect_timeout'=>5],
'from'=> 0,
'size'=>25,
 'body'  =>[
'query' => [
 'bool'=>
            [
            'should'=> [[
                'multi_match'=> ['query'=>'iphone6','type'=>'cross_fields','fields'=>$values,'operator'=>'OR']
                ],
                ['match'=>['all'=>['query'=>'iphone6','operator'=>'OR','fuzziness'=>'AUTO'] ]]
                ]
            ]

],
'sort'=>['_score'=>['order'=>'desc']],
],

'index'=>'products_idx1'
];

 $response = $client->search($params);
echo "<pre>";print_r($response);
4

2 回答 2

1

由于我的答案已经非常大,出于可读性原因以及对于不太熟悉 Elasticsearch 中的分析器及其工作原理的人们,将有关分析 API的信息添加到另一个答案中。

在我之前的回答中,@Niraj 提到其他文档正在工作,但他遇到了iphone6查询问题,因此为了调试问题,分析 API非常有用。

首先检查您认为应该与您的搜索查询匹配的文档的索引时间标记,在这种情况下,apple iphone 6

PUT http://{{hostname}}:{{port}}/{{index}}/_analyze

{
"text" : "apple iphone 6",
"analyzer" : "text_analyzer"
}

并生成令牌

{
"tokens": [
{
"token": "apple",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "appleiphone",
"start_offset": 0,
"end_offset": 12,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "iphone",
"start_offset": 6,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "iphone6", //note this carefully
"start_offset": 6,
"end_offset": 14,
"type": "shingle",
"position": 1,
"positionLength": 2
},
{
"token": "6",
"start_offset": 13,
"end_offset": 14,
"type": "<NUM>",
"position": 2
}
]
}

现在您可以看到我们使用的分析器iphone6也创建为令牌,现在检查搜索时间令牌

{
  "text" : "iphone6",
  "analyzer" : "text_analyzer"
}

和代币

{
    "tokens": [
        {
            "token": "iphone6",
            "start_offset": 0,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

现在您可以注意到搜索标记也iphone6创建为索引时间标记中存在的标记,这就是它与我在第一个答案中给出的完整示例中已经显示的搜索查询匹配的原因

于 2020-09-12T02:47:23.520 回答
1

Using the shingle and pattern_replace token filter it's possible to get the result for all 3 search terms which is mentioned in question and comment aka iphone, iphone6 and appleiphone and below is complete example of it.

As explained in the comment, you search time tokens generated from search term should match the index time tokens generated from indexed doc, in order to get the search result and this is what I've achieved by creating the custom analyzer.

Index mapping

{
  "settings": {
    "analysis": {
      "analyzer": {
        "text_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "shingle",
            "lowercase",
            "space_filter"
          ]
        }
      },
      "filter": {
        "space_filter": {
          "type": "pattern_replace",
          "pattern": " ",
          "replacement": "",
          "preserve_original": true
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "text_analyzer"
      }
    }
  }
}

Index your sample doc

{
  "title" : "apple iphone 6" 
}

Search query of appleiphone with result

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "appleiphone"
          }
        }
      ]
    }
  }
}

result

"hits": [
      {
        "_index": "ana",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.3439677,
        "_source": {
          "title": "apple iphone 6",
          "title_normal": "apple iphone 6"
        }
      }
    ]

Search query for iphone6 with result

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "iphone6"
          }
        }
      ]
    }
  }
}

Result

 "hits": [
      {
        "_index": "ana",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.3439677,
        "_source": {
          "title": "apple iphone 6",
          "title_normal": "apple iphone 6"
        }
      }
    ]

And Last but not the least search query for iphone

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "iphone"
          }
        }
      ]
    }
  }
}

Result

"hits": [
      {
        "_index": "ana",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.3439677,
        "_source": {
          "title": "apple iphone 6",
          "title_normal": "apple iphone 6"
        }
      }
    ]
于 2020-09-11T12:11:05.760 回答