1

I am trying to do a simple facet request over a field containing more than one word (Eg: 'Name1 Name2', sometimes with dots and commas inside) but what I get is...

 "terms" : [{
    "term" : "Name1",
    "count" : 15
},
{
    "term" : "Name2",
    "count" : 15
}]

so my field value is split by spaces and then runs the facet request...

Query example:

curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
  "query": {
    "query_string": {
      "fields": [
        "dataset"
      ],
      "query": "2",
      "default_operator": "AND"
    }
  },
  "facets": {
    "test": {
      "terms": {
        "field": [
          "speciesName"
        ],
        "size": 50000
      }
    }
  }
}'
4

2 回答 2

6

您的字段不应该被分析,或者至少不应该被标记。如果要对字段进行索引而不对其进行标记,则需要更新映射,然后重新索引。

于 2012-09-11T18:17:50.580 回答
4

First of all, javanna provided a very good answer from a practical perspective. However, for the sake of completeness, I want to mention that in some cases there is a way to do it without reindexing the data.

If the speciesName field is stored and your queries produce relatively small number of results, you can use script_field to retrieve stored field values:

curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true -d '{
  "query": {
    "query_string": {
      "fields": ["dataset"],
      "query": "2",
      "default_operator": "AND"
    }
  },
  "facets": {
    "test": {
      "terms": {
        "script_field": "_fields['\''speciesName'\''].value",
        "size": 50000
      }
    }
  }
}
'

As a result of this query, elasticsearch will retrieve the speciesName field for every record in your result set and it will construct facets from these values. Needless to say, if your result set contains millions of records, performance of this query might be sluggish.

Similarly, if the field is not stored, but record source is stored, you can use script_field facet to retrieve field values from the source:

......
"script_field": "_source['\''speciesName'\'']",
......

Again, source for each record in the result list will be retrieved and parsed, so you might need some patience to run this query on a large set of records.

于 2012-09-12T03:16:47.117 回答