2

我们有一组结构化的文档。该结构受到openxml 数据模型的极大启发。简而言之,文档由一组有序的段落组成,每个段落本身都有 id 和一组有序的运行,每个运行都有文本内容和一些元数据。

例如,以下示例文档包含两个 ["Lorem i psu m" , "dolor sit amet"] 段落。

{
    id: 1
    paragraphs : [
        {
            title: "De finibus"
            runs: [
                {text: "Lorem i", metadata: {} }, 
                {text: "psu", metadata: {bold: true} }, 
                {text: "m", metadata: {} }, 
            ] 
        },
        {
            id: 2
            runs: [
                {text: "dolor sit amet", metadata: {} }, 
            ] 
        },
    ]
}

当然,我们希望通过 Elasticsearch 索引文档,使其能够回答以下查询:

  1. 询问:dolor sit

    预期答案:in the document with title="De finibus", in the paragraph with id=2, from the 1th character of the 1s run to the 9th character of the 1rd run

  2. 询问:ipsum

    预期答案:in the document with title="De finibus", in the paragraph with id=1, from the 7th character of the 1s run to the 1st character of the 3rd run

  3. 询问:ipsum dolor

    预期答案:in the document with title="De finibus", from the 7th character of the 1s run of the paragraph with id=1 to the 5th character of the 1rd run of the paragraph with id=2

我熟悉弹性中的嵌套字段。它可能满足第一个查询。但是我们应该如何映射我们的文档以将连续的运行和段落连接在一起并灵活地回答后面的两个查询?

4

0 回答 0