elasticsearch - 为弹性搜索创建大型 char_filter 列表的自定义分析器

Question

我尝试将自定义分析器添加到弹性搜索。我有一个太大的同义词“映射”列表（mapper_list）。mapper_list 的大小约为 30.000 个元素。

requests.post(es_host + '/_close')

settings = {
    "settings" : {
        "analysis" : {
            "char_filter" : {
                "my_mapping" : {
                    "type" : "mapping",
                    "mappings" : mapper_list
                }
            },
            "analyzer" : {
                "my_analyzer" : {
                    "tokenizer" : "standard",
                    "char_filter" : ["my_mapping"]
                }
            }
        }
    }
}

requests.put(es_host + '/_settings',
             data=json.dumps(settings))

requests.post(es_host + '/_open')

来自弹性搜索的错误消息

[test-index] IndexCreationException[failed to create index]; nested: ArrayIndexOutOfBoundsException[256];
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:313)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:174)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

请对解决此问题的方法发表任何评论。

ES版本信息：

  "version" : {
    "number" : "2.4.1",
    "build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
    "build_timestamp" : "2016-09-27T18:57:55Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.2"
  }

score 0 · Accepted Answer

我认为错误的原因是由于大句子的映射。你到底想映射什么？如果您查看源代码并且您违反了该限制，则有 256 个字符的限制。我得到同样的例外

ArrayIndexOutOfBoundsException[256]

如果我尝试映射大字符串。

{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_mapping": {
          "type": "mapping",
          "mappings": ["More than 256 characters. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. => exception will be thrown"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_mapping"
          ]
        }
      }
    }
  }
}

我不知道您的用例，但您需要减少要映射的字符串的长度，然后它应该可以工作。

elasticsearch - 为弹性搜索创建大型 char_filter 列表的自定义分析器

1 回答 1

Related

Reference