elasticsearch - 测试弹性搜索自定义分析器 - 管道分隔的关键字

Question

我有这个索引pipe作为自定义分析器。当我尝试测试它时，它会返回每个字符，而不是管道分隔的单词。

我正在尝试为我的输入行keywords看起来像这样的用例构建：crockpot refried beans|corningware replacement|crockpot lids|recipe refried beansEL 将在它被分解后返回匹配项。

{
  "keywords": {
    "aliases": {

    },
    "mappings": {
      "cloud": {
        "properties": {
          "keywords": {
            "type": "text",
            "analyzer": "pipe"
          }
        }
      }
    },
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "keywords",
        "creation_date": "1513890909384",
        "analysis": {
          "analyzer": {
            "pipe": {
              "type": "custom",
              "tokenizer": "pipe"
            }
          },
          "tokenizer": {
            "pipe": {
              "pattern": "|",
              "type": "pattern"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "DOLV_FBbSC2CBU4p7oT3yw",
        "version": {
          "created": "6000099"
        }
      }
    }
  }
}

当我尝试按照本指南对其进行测试时。

curl -XPOST 'http://localhost:9200/keywords/_analyze' -d '{
 "analyzer": "pipe",
 "text": "pipe|pipe2"
}'

我逐个返回结果。

{
  "tokens": [
    {
      "token": "p",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "i",
      "start_offset": 1,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "p",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "e",
      "start_offset": 3,
      "end_offset": 4,
      "type": "word",
      "position": 3
    },

score 1 · Accepted Answer

干得好，你快到了。由于管道|字符是正则表达式中的保留字符，因此您需要像这样对其进行转义：

      "tokenizer": {
        "pipe": {
          "pattern": "\\|",   <--- change this
          "type": "pattern"
        }
      }

然后你的分析器将工作并产生这个：

{
  "tokens": [
    {
      "token": "pipe",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "pipe2",
      "start_offset": 5,
      "end_offset": 10,
      "type": "word",
      "position": 1
    }
  ]
}

elasticsearch - 测试弹性搜索自定义分析器 - 管道分隔的关键字

1 回答 1

Related

Reference