elasticsearch - Map a book in elasticsearch with many levels, nested vs parent-child relationship

Question

When creating the mappings for an index that can search through multiple books, is it preferable to use nested mappings like below, or using documents with a parent-child relationship

book: {
  properties: {
    isbn:     {       //- ISBN of the book
      type: 'string'  //- 9783791535661
    },
    title:    {       //- Title of the book
      type: 'string'  //- Alice in Wonderland
    },
    author:   {       //- Author of the book(maybe should be array)
      type: 'string'  //- Lewis Carroll
    },
    category: {       //- Category of the book(maybe should be array)
      type: 'string'  //- Fantasy
    },
    toc: {            //- Array of the chapters in the book
      type: 'nested',
      properties: {
        html: {           //- HTML Content of a chapter
          type: 'string'  //- <!DOCTYPE html><html>...</html>
        },
        title: {          //- Title of the chapter
          type: 'string'  //- Down the Rabbit Hole 
        },
        fileName: {       //- File name of this chapter
          type: 'string'  //- chapter_1.html
        }, 
        firstPage: {      //- The first page of this chapter
          type: 'integer' //- 3
        }, 
        numberOfPages: {  //- How many pages are in this chapter
          type: 'integer' //- 27
        },
        sections: {       //- An array of all of the sections within a chapter
          type: 'nested',
          properties: {
            html: {           //- The html content of a section
              type: 'string'  //- <section>...</section>
            },
            title: {          //- The title of a section
              type: 'string'  //- section number 2 or something
            },
            figures: {        //- Array of the figures within a section
              type: 'nested',
              properties: {
                html: {           //- HTML content of a figure
                  type: 'string'  //- <figure>...</figure>
                },
                caption: {        //- The name of a figure
                  type: 'string'  //- Figure 1
                },
                id: {             //- Id of a figure
                  type: 'string', // figure4
                }
              }
            },
            paragraphs: {     //- Array of the paragraphs within a section
              type: 'nested',
              properties: {   
                html: {           //- HTML content of a paragraph
                  type: 'string', //- <p>...</p>
                }
                id: {             //- Id of a paragraph
                  type: 'string', // paragraph3
                }
              }
            }
          }
        }
      }
    }
  }
}

The size of an entire books html is approximately 250kB. I would want to query things such as

- the best matching paragraph including it's nearest paragraphs on either side
- the best matching section from a single book including any child sections
- the best figure given it is inside a section with a matching title
- etc

I don't really know the specifics of the queries I would want to perform, but it is important to have a lot of flexibility to be able to try out very weird ones without having to change all of my mappings too much.

score 3 · Accepted Answer

如果您使用该nested类型，那么所有内容都将包含在同一个_source文档中，对于大书来说，这可能会非常拗口。

然而，如果您为每个章节和/或部分使用父/子文档，您最终可能会得到更容易咀嚼的更小块......

与往常一样，它在很大程度上取决于您要进行的查询，因此您应该首先考虑您要支持的所有用例，然后您将更好地确定哪种方法最好。

还有另一种既不使用嵌套也不使用父/子的方法，它只涉及非规范化。具体来说，您选择要考虑的最小“实体”，例如一个部分，然后简单地为每个部分创建独立的文档。在这些章节文档中，您将拥有书名、作者、章节标题、章节标题等字段。

您可以在自己的索引中尝试每种方法，看看它对您的用例有何影响。

score 0 · Accepted Answer

嵌套基本上是将所有内容填充到同一个文档中的一种方式。这对搜索很有用，但它使某些事情变得相当困难。

就像 - 例如 - 如果您尝试查找特定章节部分 - 您的查询将返回正确的文档 - 整本书。我想这可能不是您想要的，因此建立parent/child关系将是合适的方式。

或者只是不打扰，将书籍/章节/部分视为索引中的单独类型，按需查询和“加入”。

elasticsearch - Map a book in elasticsearch with many levels, nested vs parent-child relationship

2 回答 2

Related

Reference