4

Jest为 elasticsearch提供了出色的异步 API,我们发现它非常有用。然而,有时结果表明产生的请求与我们预期的略有不同。

通常我们不在乎,因为一切正常,但在这种情况下并非如此。

我想使用自定义 ngram 分析器创建索引。当我按照 elasticsearch rest API 文档执行此操作时,我在下面调用:

curl -XPUT 'localhost:9200/test' --data '
{
  "settings": {
    "number_of_shards": 3,
    "analysis": {
      "filter": {
        "keyword_search": {
          "type":     "edge_ngram",
          "min_gram": 3,
          "max_gram": 15
        }
      },
      "analyzer": {
        "keyword": {
          "type":      "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "keyword_search"
          ]
        }
      }
    }
  }
}'

然后我确认分析仪配置正确:

curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens

作为回应,我收到了多个令牌,例如expexpeexpec等。

现在使用 Jest 客户端,我将配置 json 放到我的类路径上的一个文件中,内容与上面的 PUT 请求的正文完全相同。我执行这样构造的 Jest 动作:

new CreateIndex.Builder(name)
            .settings(
                    ImmutableSettings.builder()
                            .loadFromClasspath(
                                    "settings.json"
                            ).build().getAsMap()
            ).build();

结果

  • Primo - 使用 tcpdump 检查实际发布到 elasticsearch 的内容是(打印得很漂亮):

    {
      "settings.analysis.filter.keyword_search.max_gram": "15",
      "settings.analysis.filter.keyword_search.min_gram": "3",
      "settings.analysis.analyzer.keyword.tokenizer": "whitespace",
      "settings.analysis.filter.keyword_search.type": "edge_ngram",
      "settings.number_of_shards": "3",
      "settings.analysis.analyzer.keyword.filter.0": "lowercase",
      "settings.analysis.analyzer.keyword.filter.1": "keyword_search",
      "settings.analysis.analyzer.keyword.type": "custom"
    }
    
  • Secundo - 生成的索引设置为:

    {
      "test": {
        "settings": {
          "index": {
            "settings": {
              "analysis": {
                "filter": {
                  "keyword_search": {
                    "type": "edge_ngram",
                    "min_gram": "3",
                    "max_gram": "15"
                  }
                },
                "analyzer": {
                  "keyword": {
                    "filter": [
                      "lowercase",
                      "keyword_search"
                    ],
                    "type": "custom",
                    "tokenizer": "whitespace"
                  }
                }
              },
              "number_of_shards": "3"   <-- the only difference from the one created with rest call
            },
            "number_of_shards": "3",
            "number_of_replicas": "0",
            "version": {"created": "1030499"},
            "uuid": "Glqf6FMuTWG5EH2jarVRWA"
          }
        }
      }
    }
    
  • Tertio - 检查分析仪,curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens我只得到一个令牌!

问题 1. Jest 没有发布我的原始设置 json 而是一些处理的原因是什么?

问题 2.为什么 Jest 生成的设置不起作用?

4

1 回答 1

8

很高兴您发现 Jest 很有用,请在下面查看我的回答。

问题 1. Jest 没有发布我的原始设置 json 而是一些处理的原因是什么?

这不是 Jest,而是 ElasticsearchImmutableSettings这样做,请参阅:

    Map test = ImmutableSettings.builder()
            .loadFromSource("{\n" +
                    "  \"settings\": {\n" +
                    "    \"number_of_shards\": 3,\n" +
                    "    \"analysis\": {\n" +
                    "      \"filter\": {\n" +
                    "        \"keyword_search\": {\n" +
                    "          \"type\":     \"edge_ngram\",\n" +
                    "          \"min_gram\": 3,\n" +
                    "          \"max_gram\": 15\n" +
                    "        }\n" +
                    "      },\n" +
                    "      \"analyzer\": {\n" +
                    "        \"keyword\": {\n" +
                    "          \"type\":      \"custom\",\n" +
                    "          \"tokenizer\": \"whitespace\",\n" +
                    "          \"filter\": [\n" +
                    "            \"lowercase\",\n" +
                    "            \"keyword_search\"\n" +
                    "          ]\n" +
                    "        }\n" +
                    "      }\n" +
                    "    }\n" +
                    "  }\n" +
                    "}").build().getAsMap();
    System.out.println("test = " + test);

输出:

test = {
    settings.analysis.filter.keyword_search.type=edge_ngram,
    settings.number_of_shards=3,
    settings.analysis.analyzer.keyword.filter.0=lowercase,
    settings.analysis.analyzer.keyword.filter.1=keyword_search,
    settings.analysis.analyzer.keyword.type=custom,
    settings.analysis.analyzer.keyword.tokenizer=whitespace,
    settings.analysis.filter.keyword_search.max_gram=15,
    settings.analysis.filter.keyword_search.min_gram=3
}

问题 2.为什么 Jest 生成的设置不起作用?

因为您对设置 JSON/map 的使用不是预期的情况。我创建了这个测试来重现您的案例(它有点长,但请耐心等待):

    @Test
    public void createIndexTemp() throws IOException {
        String index = "so_q_26949195";

        String settingsAsString = "{\n" +
                "  \"settings\": {\n" +
                "    \"number_of_shards\": 3,\n" +
                "    \"analysis\": {\n" +
                "      \"filter\": {\n" +
                "        \"keyword_search\": {\n" +
                "          \"type\":     \"edge_ngram\",\n" +
                "          \"min_gram\": 3,\n" +
                "          \"max_gram\": 15\n" +
                "        }\n" +
                "      },\n" +
                "      \"analyzer\": {\n" +
                "        \"keyword\": {\n" +
                "          \"type\":      \"custom\",\n" +
                "          \"tokenizer\": \"whitespace\",\n" +
                "          \"filter\": [\n" +
                "            \"lowercase\",\n" +
                "            \"keyword_search\"\n" +
                "          ]\n" +
                "        }\n" +
                "      }\n" +
                "    }\n" +
                "  }\n" +
                "}";
        Map settingsAsMap = ImmutableSettings.builder()
                .loadFromSource(settingsAsString).build().getAsMap();

        CreateIndex createIndex = new CreateIndex.Builder(index)
                .settings(settingsAsString)
                .build();

        JestResult result = client.execute(createIndex);
        assertTrue(result.getErrorMessage(), result.isSucceeded());

        GetSettings getSettings = new GetSettings.Builder().addIndex(index).build();
        result = client.execute(getSettings);
        assertTrue(result.getErrorMessage(), result.isSucceeded());
        System.out.println("SETTINGS SENT AS STRING settingsResponse = " + result.getJsonString());

        Analyze analyze = new Analyze.Builder()
                .index(index)
                .analyzer("keyword")
                .source("Expecting many tokens")
                .build();
        result = client.execute(analyze);
        assertTrue(result.getErrorMessage(), result.isSucceeded());
        Integer actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
        assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);

        analyze = new Analyze.Builder()
                .analyzer("keyword")
                .source("Expecting single token")
                .build();
        result = client.execute(analyze);
        assertTrue(result.getErrorMessage(), result.isSucceeded());
        actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
        assertTrue("Expected single token but got " + actualTokens, actualTokens == 1);

        admin().indices().delete(new DeleteIndexRequest(index)).actionGet();

        createIndex = new CreateIndex.Builder(index)
                .settings(settingsAsMap)
                .build();

        result = client.execute(createIndex);
        assertTrue(result.getErrorMessage(), result.isSucceeded());

        getSettings = new GetSettings.Builder().addIndex(index).build();
        result = client.execute(getSettings);
        assertTrue(result.getErrorMessage(), result.isSucceeded());
        System.out.println("SETTINGS AS MAP settingsResponse = " + result.getJsonString());

        analyze = new Analyze.Builder()
                .index(index)
                .analyzer("keyword")
                .source("Expecting many tokens")
                .build();
        result = client.execute(analyze);
        assertTrue(result.getErrorMessage(), result.isSucceeded());
        actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
        assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);
    }

当您运行它时,您会看到settingsAsMap使用实际设置的情况完全错误(settings包括另一个settings是您的 JSON,但它们应该已合并),因此分析失败。

为什么这不是预期的用途?

仅仅是因为这就是 Elasticsearch 在这种情况下的行为方式。如果设置数据被展平(因为它默认由ImmutableSettings类完成),那么它不应该有顶级元素settings,但如果数据没有展平,它可以有相同的顶级元素(这就是为什么测试用例settingsAsString有效) .

tl;博士:

您的设置 JSON 不应包含顶级“设置”元素(如果您通过 运行它ImmutableSettings)。

于 2014-11-16T11:02:50.100 回答