1

我有一个包含许多标签的用户文档这
是映射:

{
  "user" : {
    "properties" : {
      "tags" : {
        "type" : "nested",
        "properties" : {
          "id" : {
            "type" : "string",
            "index" : "not_analyzed",
            "store" : "yes"
          },
          "current" : {
            "type" : "boolean"
          },
          "type" : {
            "type" : "string"
          },
          "value" : {
            "type" : "multi_field",
            "fields" : {
              "value" : {
                "type" : "string",
                "analyzer" : "name_analyzer"
              },
              "value_untouched" : {
                "type" : "string",
                "index" : "not_analyzed",
                "include_in_all" : false
              }
            }
          }
        }
      }
    }
  }
}

以下是示例用户文档:
用户 1

{
  "created_at": 1317484762000,
  "updated_at": 1367040856000,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
    },
    {
      "type": "company",
      "value": "alma connect",
      "id": "58ad4afcc8415216ea451339aaecf311ed40e132"
    },
    {
      "type": "company",
      "value": "Google",
      "id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5",
      "current": true
    },
    {
      "type": "discipline",
      "value": "B.Tech.",
      "id": "a7706af7f1477cbb1ac0ceb0e8531de8da4ef1eb",
      "institute_id": "4fb424a5addf32296f00013a"
    },    
  ]
}

用户 2:

{
  "created_at": 1318513355000,
  "updated_at": 1364888695000,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
    },
    {
      "type": "college",
      "value": "Bharatiya Vidya Bhavan's Public School, Jubilee hills, Hyderabad",
      "id": "d20730345465a974dc61f2132eb72b04e2f5330c"
    },
    {
      "type": "company",
      "value": "Alma Connect",
      "id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5"
    },
    {
      "type": "sector",
      "value": "Website and Software Development",
      "id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a4"
    }    
  ]
}

用户 3:

{
  "created_at": 1318513355001,
  "updated_at": 1364888695010,
  "tags": [
    {
      "type": "college",
      "value": "Dhirubhai Ambani Institute of Information and Communication Technology",
      "id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361821"
    },
    {
      "type": "sector",
      "value": "Website and Software Development",
      "id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a1"
    }    
  ]
}

使用上述 ES 文档进行搜索,我想构建一个查询,我需要在其中获取嵌套标签文档中具有公司标签的用户或没有任何公司标签的用户。我的搜索查询是什么?

例如在上面的例子中,如果搜索 google 标签,那么返回的文档应该是 'user 1' 和 'user 3' (因为用户 1 有公司标签 google 而用户 3 没有公司标签)。用户 2 没有返回,因为它也有 google 以外的公司标签。

4

1 回答 1

3

一点也不简单,主要是因为 not have a type:company 标签子句。这是我想出的:

{
  "or" : {
    "filters" : [ {
      "nested" : {
        "filter" : {
          "and" : {
            "filters" : [ {
              "term" : {
                "tags.value" : "google"
              }
            }, {
              "term" : {
                "tags.type" : "company"
              }
            } ]
          }
        },
        "path" : "tags"
      }
    }, {
      "not" : {
        "filter" : {
          "nested" : {
            "filter" : {
              "term" : {
                "tags.type" : "company"
              }
            },
            "path" : "tags"
          }
        }
      }
    } ]
  }
}

它包含一个带有两个嵌套子句的or 过滤器:第一个查找具有 tags.type:company 和 tags.value:google 的文档,而第二个查找所有没有任何 tags.type:company 的文档。

这需要优化,因为 and/or/not 过滤器没有利用缓存来处理位集的过滤器,就像术语过滤器一样。最好花一些时间来找到一种使用布尔过滤器并获得相同结果的方法。看看这篇文章了解更多。

于 2013-05-06T20:31:52.320 回答