lucene - 如何处理 Elasticsearch 中的重复数据？

翻译自：https://stackoverflow.com/questions/14060845 2012-12-27T20:26:03.573

769 次

我已经使用父子映射来规范化数据，但据我所知，没有办法从 _parent 文档中获取任何字段。

这是我的索引的映射：

{
 "mappings": {
    "building": {
      "properties": {
        "name": {
          "type": "string"
        }
      }
    },
    "flat": {
      "_parent": {
        "type": "building"
      },
      "properties": {
        "name": {
          "type": "string"
        }
      }
    },
    "room": {
      "_parent": {
        "type": "flat"
      },
      "properties": {
        "name": {
          "type": "string"
        },
        "floor": {
          "type": "long"
        }
      }
    }
  }
}

现在，我正试图找到最好的存储方式flat_name和building_name房间类型。我不会查询这些字段，但是当我查询其他字段（如floor.

将有数百万个房间，我没有太多内存，所以我怀疑这些重复的值可能会导致内存不足。目前，flat_name字段building_name具有"index": "no"属性，我为 _source 字段打开了压缩。

您是否有任何有效的建议来避免重复值，例如查询多个查询或从 _parent 文档中获取字段的 hacky 方法或非规范化数据是处理此类问题的唯一方法？

lucene - 如何处理 Elasticsearch 中的重复数据？

0 回答 0

Related

Reference