2

我的索引中的术语有一个备用拼写文件。我想生成包含特定术语的替代拼写的二元组。例如,我biriyani, biryani, briyani的备用拼写 csv 文件中有我的字段包含文本Chicken Biryani。我希望能够生产chicken biryani, chicken biriyani, chicken briyani代币。

现在,如果我使用带有同义词过滤器的空白标记器,则会生成chicken, biriyani, biryani, briyani预期的以下标记。现在,如果我应用 shingle 过滤器,则生成的令牌是chicken, chicken biryani, biryani, biryani biriyani, biriyani, biriyani briyani, briyani. 此标记流包含单词本身的同义词的带状疱疹,这些同义词不应该存在,并且它不包含带有chicken [alternate spellings of biryani]像 chicken biriyani 或 chicken briyani 等的标记。如果我在同义词过滤器之前放置 shingle 过滤器,那么它只会添加同义词标记一元:chicken, chicken biryani, biriyani, biryani, briyani。有没有办法生成包含与原始标记相同位置的同义词的标记,或者在这种情况下chicken biryani, chicken biriyani, chicken briyani

测试示例设置:

PUT test_bigram
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "biriyani, biryani, briyani"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "filter": [
              "synonym"
            ],
            "type": "custom",
            "tokenizer": "whitespace"
          },
          "shingle_synonym": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
              "shingle",
              "synonym"
            ]
          },
          "synonym_shingle": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
              "synonym",
              "shingle"
            ]
          }
        }
      }
    }
  }
}

我正在运行 Elasticsearch 5.6

4

0 回答 0