5

我正在使用范围方面进行搜索:

{
"query": {
    "match_all": {}
},
"facets": {
    "prices": {
        "range": {
            "field": "product_price",
            "ranges": [
                {"from": 0, "to": 200},
                {"from": 200, "to": 400},
                {"from": 400, "to": 600},
                {"from": 600, "to": 800},
                {"from": 800}
            ]
        }
    }
}
}

正如预期的那样,我得到了范围的响应:

[
  {
    "from": 0.0,
    "to": 200.0,
    "count": 0,
    "total_count": 0,
    "total": 0.0,
    "mean": 0.0
  },
  {
    "from": 200.0,
    "to": 400.0,
    "count": 1,
    "min": 399.0,
    "max": 399.0,
    "total_count": 1,
    "total": 399.0,
    "mean": 399.0
  },
  {
    "from": 400.0,
    "to": 600.0,
    "count": 5,
    "min": 499.0,
    "max": 599.0,
    "total_count": 5,
    "total": 2886.0,
    "mean": 577.2
  },
  {
    "from": 600.0,
    "to": 800.0,
    "count": 3,
    "min": 690.0,
    "max": 790.0,
    "total_count": 3,
    "total": 2179.0,
    "mean": 726.3333333333334
  },
  {
    "from": 800.0,
    "count": 2,
    "min": 899.0,
    "max": 990.0,
    "total_count": 2,
    "total": 1889.0,
    "mean": 944.5
  }
]

在所有响应中counttotal_count是相同的。有人知道它们之间有什么区别吗?我应该使用哪一个?

4

1 回答 1

10

Very good question! This part is tricky since you see the same values most of the time, but... when you use the key_field and value_field you can compute the ranges based on a field and the aggregated data (min,max,total_count,total and mean) on another field. For instance you could compute the ranges on a popularity field and see the aggregated data on a price field, to see for every range of popularity what kind of price you have; maybe people like cheap products, or maybe not?

Let's imagine your products can have multiple prices, let's say for example a different price per country...this is when you have count that differs from total_count. Let's have a look at an example.

Let's index a couple of documents that contain a popularity field and a price field, which can have multiple values:

{
  "popularity": 50,
  "price": [28,30,32]
}

and

{
    "popularity": 120,
    "price": [50,54]
}

Let's now run the following search request, which builds a range facet using the popularity field as key and the price field as value:

{
    "query": {
        "match_all": {}
    },
    "facets": {
        "popularity_prices": {
            "range": {
                "key_field": "popularity",
                "value_field": "price",
                "ranges": [
                    {"to": 100},
                    {"from": 100}
                ]
            }
        }
    }
}

Here is the obtained facet:

{
    "popularity_prices": {
      "_type": "range",
      "ranges": [
        {
          "to": 100,
          "count": 1,
          "min": 28,
          "max": 32,
          "total_count": 3,
          "total": 90,
          "mean": 30
        },
        {
          "from": 100,
          "count": 1,
          "min": 50,
          "max": 54,
          "total_count": 2,
          "total": 104,
          "mean": 52
        }
      ]
    }
}

It should be clearer now what the total_count is. It relates to the value_field (price): 3 different price values fall into the first range, but they come from the same document. On the other hand count is the number of documents that fall into the range.

Now that we also understood the count is about documents while the total_count is about field values, we would expect the same behaviour with a normal range facet, if the field holds multiple values...right? Unfortunately that doesn't currently happen, the range facet will consider only the first value for each field. Not sure whether it's a bug. Therefore the count and the total_count are always the same.

于 2013-05-28T08:59:21.037 回答