elasticsearch - Elasticsearch 聚合将结果转换为小写

Question

我一直在玩 ElasticSearch，在进行聚合时发现了一个问题。

我有两个端点，/A和/B。在第一个中，我有第二个的父母。因此，B 中的一个或多个对象必须属于 A 中的一个对象。因此，B 中的对象具有属性“parentId”，其父索引由 ElasticSearch 生成。

我想通过 B 的子属性过滤 A 中的父级。为了做到这一点，我首先按属性过滤 B 中的子级，并获取其唯一的父级 ID，稍后我将使用它来获取父级。

我发送这个请求：

POST http://localhost:9200/test/B/_search
{
    "query": {
        "query_string": {
            "default_field": "name",
            "query": "derp2*"
        }
    },
    "aggregations": {
        "ids": {
            "terms": {
                "field": "parentId"
            }
        }
    }
}

并得到这个回应：

{
  "took": 91,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjH5u40Hx1Kh6rfQG",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child2"
        }
      },
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjD_U40Hx1Kh6rfQF",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child1"
        }
      },
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjKqf40Hx1Kh6rfQH",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child3"
        }
      }
    ]
  },
  "aggregations": {
    "ids": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "au_ffvwm40hx1kh6rfqa",
          "doc_count": 3
        }
      ]
    }
  }
}

由于某种原因，过滤后的键以小写形式返回，因此无法向 ElasticSearch 请求父级

GET http://localhost:9200/test/A/au_ffvwm40hx1kh6rfqa

Response:
{
  "_index": "test",
  "_type": "A",
  "_id": "au_ffvwm40hx1kh6rfqa",
  "found": false
}

关于为什么会发生这种情况的任何想法？

score 5 · Accepted Answer

命中和聚合结果之间的区别在于聚合作用于创建的术语。他们还将返回条款。命中返回原始来源。

这些术语是如何创建的？基于选择的分析器，在您的情况下是默认分析器，即标准分析器。该分析器所做的其中一件事是将术语的所有字符小写。就像 Andrei 提到的，您应该将字段 parentId 配置为 not_analyzed。

PUT test
{
  "mappings": {
    "B": {
      "properties": {
        "parentId": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }   
}

score 4 · Accepted Answer

我迟到了，但我遇到了同样的问题，并且理解它是由正常化引起的。

如果要防止任何规范化将聚合值更改为小写，则必须更改mapping。index

DevTools console您可以通过键入检查当前映射

GET /A/_mapping
GET /B/_mapping

当您看到索引的结构时，您必须看到parentId字段的设置。

如果您不想更改字段的行为，但又想避免在聚合期间进行规范化，则可以向该字段添加子parentId字段。

要更改映射，您必须删除索引并使用新映射重新创建它：

在您的情况下，它看起来像这样（它仅包含 parentId 字段）

PUT /B/_mapping
{
  "properties": {
    "parentId": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    }
  }
}

那么您必须在查询中使用子字段：

POST http://localhost:9200/test/B/_search
{
  "query": {
    "query_string": {
      "default_field": "name",
      "query": "derp2*"
    }
  },
  "aggregations": {
    "ids": {
      "terms": {
        "field": "parentId.keyword",
        "order": {"_key": "desc"}
      }
    }
  }
}

elasticsearch - Elasticsearch 聚合将结果转换为小写

2 回答 2

Related

Reference