elasticsearch - 在弹性搜索中索引任意属性值对的最佳方法

Question

我正在尝试对具有属性值对的弹性搜索文档进行索引。示例文档：

{
    id: 1,
    name: "metamorphosis",
    author: "franz kafka"
}

{
    id: 2,
    name: "techcorp laptop model x",
    type: "computer",
    memorygb: 4
}

{
    id: 3,
    name: "ss2014 formal shoe x",
    color: "black",
    size: 42,
    price: 124.99
}

然后，我需要如下查询：

1. "author" EQUALS "franz kafka"
2. "type" EQUALS "computer" AND "memorygb" GREATER THAN 4
3. "color" EQUALS "black" OR ("size" EQUALS 42 AND price LESS THAN 200.00)

存储这些文档以有效查询它们的最佳方法是什么？我应该完全按照示例中所示存储它们吗？或者我应该像这样存储它们：

{
    fields: [
        { "type": "computer" },
        { "memorygb": 4 }
    ]
}

或喜欢：

{
    fields: [
        { "key": "type", "value": "computer" },
        { "key": "memorygb", "value": 4 }
    ]
}

我应该如何映射我的索引以便能够执行我的相等和范围查询？

score 10 · Accepted Answer

如果有人还在寻找答案，我写了一篇关于如何将任意数据索引到 Elasticsearch 中然后按特定字段和值进行搜索的帖子。所有这一切，都不会破坏您的索引映射。

帖子：http ://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/

简而言之，您将需要创建帖子中描述的特殊索引。然后你需要使用flattenData函数https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4来展平你的数据。然后，扁平化的数据可以安全地索引到 Elasticsearch 索引中。

例如：

flattenData({
    id: 1,
    name: "metamorphosis",
    author: "franz kafka"
});

将产生：

[
    {
        "key": "id",
        "type": "long",
        "key_type": "id.long",
        "value_long": 1
    },
    {
        "key": "name",
        "type": "string",
        "key_type": "name.string",
        "value_string": "metamorphosis"
    },
    {
        "key": "author",
        "type": "string",
        "key_type": "author.string",
        "value_string": "franz kafka"
    }
]

和

flattenData({
    id: 2,
    name: "techcorp laptop model x",
    type: "computer",
    memorygb: 4
});

将产生：

[
    {
        "key": "id",
        "type": "long",
        "key_type": "id.long",
        "value_long": 2
    },
    {
        "key": "name",
        "type": "string",
        "key_type": "name.string",
        "value_string": "techcorp laptop model x"
    },
    {
        "key": "type",
        "type": "string",
        "key_type": "type.string",
        "value_string": "computer"
    },
    {
        "key": "memorygb",
        "type": "long",
        "key_type": "memorygb.long",
        "value_long": 4
    }
]

然后，您可以使用构建 Elasticsearch 查询来查询您的数据。每个查询都应该指定键和值的类型。如果您不确定索引具有哪些键或类型，您可以运行聚合来查找，这也在帖子中进行了讨论。

例如，要查找author == "franz kafka"需要执行以下查询的文档：

{
    "query": {
        "nested": {
            "path": "flatData",
            "query": {
                "bool": {
                    "must": [
                        {"term": {"flatData.key": "author"}},
                        {"match": {"flatData.value_string": "franz kafka"}}
                    ]
                }
            }
        }
    }
}

要查找type == "computer" and memorygb > 4需要执行以下查询的文档：

{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "path": "flatData",
                        "query": {
                            "bool": {
                                "must": [
                                    {"term": {"flatData.key": "type"}},
                                    {"match": {"flatData.value_string": "computer"}}
                                ]
                            }
                        }
                    }
                },
                {
                    "nested": {
                        "path": "flatData",
                        "query": {
                            "bool": {
                                "must": [
                                    {"term": {"flatData.key": "memorygb"}},
                                    {"range": {"flatData.value_long": {"gt": 4}}}
                                ]
                            }
                        }
                    }
                }
            ]
        }
    }
}

在这里，因为我们希望同一个文档同时匹配两个条件，所以我们使用bool带有包含must两个查询的子句的外部nested查询。

score 1 · Accepted Answer

Elastic Search 是一种无模式数据存储，它允许对新属性进行动态索引，并且具有可选字段不会影响性能。您的第一个映射绝对没问题，您可以围绕动态属性进行布尔查询。将它们设为嵌套字段并没有固有的性能优势，它们无论如何都会在诸如 fields.type 、 fields.memorygb 等索引上被展平。

相反，您尝试存储为键值对的最后一个映射将对性能产生影响，因为您必须查询 2 个不同的索引字段，即 key='memorygb' 和 value =4

查看有关动态映射的文档：

Elasticsearch 最重要的特性之一是它的无模式能力。如果对象是动态的，则没有性能开销，将其关闭的能力是作为一种安全机制提供的，因此“格式错误”的对象不会错误地索引我们不希望被索引的数据。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-object-type.html

score 0 · Accepted Answer

0

你需要从这里过滤查询：

您必须将范围查询与匹配查询一起使用

于 2015-02-18T12:48:51.340 回答

elasticsearch - 在弹性搜索中索引任意属性值对的最佳方法

3 回答 3

Related

Reference