elasticsearch - 如何在 AWS ElasticSearch 的无痛内联脚本中替换没有正则表达式的字符串？

Question

文档中“级别”字段的类型已从“关键字”更改为“短”，我正在尝试重新索引现有数据以便能够在 Kibana 图表中使用它。旧数据包含以下值：“100%”、“错误”或只是空字符串“”。

我只想在新索引中获取整数。我使用内部重新索引 API（添加新行以使代码段更具可读性）：

curl -s -X POST -H 'Content-Type: application/json' https://search-host.us-east-1.es.amazonaws.com/_reindex -d '{
  "source": {
    "index": "old-index"
  },  
  "dest": {
    "index": "new-index"
  },  
  "script": {
    "inline": "
        if (ctx._source.level == \"error\" || ctx._source.level == \"\")
        {
            ctx._source.level = -1
        } else {
            ctx._source.level = Integer.valueOf(ctx._source.level)    )
        }
    "
  }
}'

但我收到错误消息：“java.lang.String 无法转换为 java.lang.Number”，因为值末尾有“%”符号。

此外，我没有为 AWS ElasticSearch 启用正则表达式，并且无法按照我的想法进行操作。所以 replaceAll 的变体对我不起作用。例如，如果我有自托管的 ES，它可能是这样的（没有测试过）/(%)?/.matcher(doc['level'].value).replaceAll('$1')：：

但从 AWS ES 我看到了这一点：

Regexes are disabled. Set [script.painless.regex.enabled] to [true] in elasticsearch.yaml to allow them. Be careful though, regexes break out of Painless's protection against deep recursion and long loops.

是否可以在没有正则表达式的情况下用无痛语言替换字符串？

score 2 · Accepted Answer

"script": {
    "lang":"painless",
    "source": """

      //function declaration
      String replace(String word, String oldValue, String newValue) {
        String[] pieces = word.splitOnToken(oldValue);
        int lastElIndex = pieces.length-1;
        pieces[lastElIndex] = newValue;
        def list = Arrays.asList(pieces);
        return String.join('',list);
      }

      //usage sample
      ctx._source["date"] = replace(ctx._source["date"],"+0000","Z");

    """
}

score 1 · Accepted Answer

我试图做同样的事情，我最终会在我的一个索引的字符串字段中进行完整的查找和替换。不幸的是，对我来说，我也无法访问 RegEx。

这是我提出的解决方案，使用如下所示的摄取管道：

PUT _ingest/pipeline/my-pipeline-id
{
    "description": "Used to update in place",
    "processors": [
        {
            "grok": {
                "field": "myField",
                "patterns": ["%{PART1:field1}%{REMOVAL}%{PART2:field2}"],
                "pattern_definitions": {
                    "PART1": "start",
                    "REMOVAL": "(toRemove){0,1}",
                    "PART2": ".+"
                },
                "ignore_missing": true
            }
        },
        {
            "script": {
                "lang": "painless",
                "inline": "ctx.myField = ctx.field1 + ctx.field2"
            }
        },
        {
            "script": {
                "lang": "painless",
                "inline": "ctx.remove('field1'); ctx.remove('field2')"
            }
        }
    ]
}

然后你运行它（我已经使用查询更新完成了它）

POST /index/type/_update_by_query?pipeline=my-pipeline-id
{
    "query": {
        "match": {
            "id": "123456789"
        }
    }
}

有用的链接

请注意

我正在使用 ES 5.5。版本 6 的某些语法已更改，但过程保持不变。

elasticsearch - 如何在 AWS ElasticSearch 的无痛内联脚本中替换没有正则表达式的字符串？

2 回答 2

有用的链接

请注意

Related

Reference