sql - 如何在 Elasticsearch 中组合属性/字段

Question

在设计应用程序时，我想知道 ElasticSearch 是否是实现它的合适工具（以及如何实现它）。任何建议将不胜感激！

我的应用程序需要存储（许多）文档，每个文档都表示为一个单词序列。我还想将信息与每个单词相关联。例如，假设我想将单词长度与每个单词相关联。所以我会有这样的事情：

The      house   is      yellow
3        5       2       6

现在，我想执行查询，例如“给我长度为 2 的单词，后跟单词 'yellow'”。在关系数据库中，我会将单词形式和长度存储为不同的属性，例如：

Word        Length        N
---------------------------
the           3           1
house         5           2
is            2           3
yellow        6           4

（其中 N 是单词的位置），在 SQL 中我会做这样的事情：

SELECT word, N1 as N
FROM   documents
WHERE  (word=”yellow” AND N1 in (SELECT N2 as N
                                 FROM documents
                                 WHERE length=2 AND (N1-N2=1 OR N2-N1=1)
       )
)

我正在努力在 ElasticSearch 中实现同样的功能。我已经阅读了在线手册和参考书，但我无法弄清楚如何使用 ES 做到这一点。因此，您的任何建议将不胜感激。

考虑到：数据库将有许多与单词相关的属性，我需要查询它们的任意组合。这些属性是预先计算好的并离线加载到数据库中。

谢谢！

score 0 · Accepted Answer

首先，感谢您的回答。我已经阅读了有关自定义分析器的信息和示例，但我仍然不知道该怎么做。

这是我所做的文档映射：

        "mappings" : {
        "Sentence": {
            "properties" : {
                    "word":{
                        "type":"string",
                           "index" : "not_analyzed"
                     },
                     "attributes":{
                           "properties":{
                                "length”: {
                                    "type": "integer",
                                    "index_analyzer": "standard"
                                },
                                "N": {
                                    "type": "integer",
                                    "index_analyzer": "standard" 
                                }
                           }
                     } 
            }
        }
   }

这是索引文档：

curl -XPUT http://localhost:9200/documents/Sentence/1 -d '
    {
    "Sentence":[                  
        {"word":"the",     
         "attributes":{          
            "length”:3,
            "N":1
         }
        },
        {"word":"house",
         "attributes":{
            "length”:5,
            "N":2
         }
        },
        {"word":"is",
         "attributes":{
            "length”:2,
            "N":3
         }
        },
        {"word":"yellow",
         "attributes":{
            "length”:6,
            "N":4
         }
        }
    ]
}';

我尝试使用跨度查询执行上一个查询（“给我长度为 2 的单词，后跟单词 'yellow'”）：

curl -XPOST http://localhost:9200/documents/Sentence/_search?pretty -d '
    {
        "query": {
           "span_near": {
                "clauses": [
                    {"span_term" : {"word":"yellow"}},
                    {"span_term" : {"length”:2}}
                ],
                "slop":0
            }
        }

}';

但我不能这样做，因为子句必须具有相同的字段。所以我放弃了那个选项（跨度查询）。

如何创建自定义分析器来执行我想要的查询？

谢谢你。

sql - 如何在 Elasticsearch 中组合属性/字段

1 回答 1

Related

Reference