3

I have a question regarding the setup of my elasticsearch database index... I have created a table which I have rivered to index in elasticsearch. The table is built from a script that queries multiple tables to denormalize data making it easier to index by a unique id 1:1 ratio

An example of a set of fields I have is street, city, state, zip, which I can query on, but my question is , should I be keeping those fields individually indexed , or be concatenating them as one big field like address which contains all of the previous fields into one? Or be putting in the extra time to setup parent-child indexes?

The use case example is I have a customer with billing info coming from one direction, I want to query elasticsearch to see if that customer already exists, or at least return the closest result

I know this question is more conceptual than programming, I just can't find any information of best practices.

4

1 回答 1

3

级联

对于您问题的第一部分:我不会将不同的字段连接到包含所有信息的字段中。拥有多个字段可以让您在这些字段上计算方面和聚合的优势,例如有多少客户来自特定城市或有特定邮编。您仍然可以使用匹配或多重匹配查询来查询来自不同字段的信息。

除了将信息放在单独的字段中之外,我还会使用带有已分析和未分析部分(fieldname.raw)的多字段。这再次允许聚合、分面和排序。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html

想想“纽约”:如果您分析它将存储为 ['New', 'York'] 并且您将无法看到来自 'New York' 的所有人。你会看到来自“New”和“York”的所有人。

_所有字段

elasticsearch 中有一个特殊的 _all 字段,它在后台进行连接。你不必自己做。可以启用/禁用它。

父子关系

关于是否使用嵌套对象或父子关系的部分:我认为使用父子关系更适合您的情况。嵌套对象以“扁平化”方式存储,即来自数组中嵌套对象的信息作为一个对象的一部分存储。考虑以下示例:

您有一个客户的订单:

client: 'Samuel Thomson'
    orderline: 'Strong Thinkpad'
    orderline: 'Light Macbook'
client: 'Jay Rizzi'
    orderline: 'Strong Macbook'

如果您搜索订购“Strong Macbook”的客户,则使用嵌套对象,您将获得两个客户。这是因为 'Samuel Thomson' 和他的订单是一起存储的,即 ['Strong' 'Thinkpad' 'Light' 'Macbook'],所以两条订单线之间没有区别。

通过使用父子文档,同一客户的订单不会混合并保留其身份。

于 2014-03-01T10:59:24.287 回答