1

我对弹性搜索很陌生,并且一直在努力让这种排序工作。总体思路是使用嵌套消息和嵌套参与者搜索电子邮件消息线程。目标是在线程级别显示搜索结果,按执行搜索的参与者以及 last_received_at 或 last_sent_at 列排序,具体取决于他们所在的邮箱。

我的理解是,您无法在许多嵌套子项中按单个子项的值进行排序。因此,为了做到这一点,我看到了一些关于使用带有脚本的 custom_score 的建议,然后对分数进行排序。我的计划是动态更改排序列,然后运行嵌套的 custom_score 查询,该查询将返回参与者之一的日期作为分数。我一直注意到分数格式很奇怪的一些问题(例如,最后总是有 4 个零)并且它可能不会返回我期望的日期。

以下是相关索引和查询的简化版本。如果有人有任何建议,我将不胜感激。(仅供参考 - 我使用的是弹性搜索 0.20.6 版。)

指数:

mappings: {
    message_thread: {
        properties: {
            id: {
                type: long
            }
            subject: {
                dynamic: true
                properties: {
                    id: {
                        type: long
                    }
                    name: {
                        type: string
                    }
                }
            }
            participants: {
                dynamic: true
                properties: {
                    id: {
                        type: long
                    }
                    name: {
                        type: string
                    }
                    last_sent_at: {
                        format: dateOptionalTime
                        type: date
                    }
                    last_received_at: {
                        format: dateOptionalTime
                        type: date
                    }
                }
            }
            messages: {
                dynamic: true
                properties: {
                    sender: {
                        dynamic: true
                        properties: {
                            id: {
                                type: long
                            }
                        }
                    }
                    id: {
                        type: long
                    }
                    body: {
                        type: string
                    }
                    created_at: {
                        format: dateOptionalTime
                        type: date
                    }
                    recipient: {
                        dynamic: true
                        properties: {
                            id: {
                                type: long
                            }
                        }
                    }
                }
            }
            version: {
                type: long
            }
        }
    }
}

询问:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": { "participants.id": 3785 }
        },
        {
          "custom_score": {
            "query": {
              "filtered": {
                "query": { "match_all": {} },
                "filter": {
                  "term": { "participants.id": 3785 }
                }
              }
            },
            "params": { "sort_column": "participants.last_received_at" },
            "script": "doc[sort_column].value"
          }
        }
      ]
    }
  },
  "filter": {
    "bool": {
      "must": [
        {
          "term": { "messages.recipient.id": 3785 }
        }
      ]
    }
  },
  "sort": [ "_score" ]
}

解决方案:

感谢@imotov,这是最终结果。参与者没有正确嵌套在索引中(而消息不需要)。此外,include_in_root 用于参与者简化查询(参与者是小记录,不是真正的大小问题,尽管@imotov 也提供了一个没有它的示例)。然后,他重组了 JSON 请求以使用 dis_max 查询。

curl -XDELETE "localhost:9200/test-idx"
curl -XPUT "localhost:9200/test-idx" -d '{
  "mappings": {
    "message_thread": {
      "properties": {
        "id": {
          "type": "long"
        },
        "messages": {
          "properties": {
            "body": {
              "type": "string",
              "analyzer": "standard"
            },
            "created_at": {
              "type": "date",
              "format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
            },
            "id": {
              "type": "long"
            },
            "recipient": {
              "dynamic": "true",
              "properties": {
                "id": {
                  "type": "long"
                }
              }
            },
            "sender": {
              "dynamic": "true",
              "properties": {
                "id": {
                  "type": "long"
                }
              }
            }
          }
        },
        "messages_count": {
          "type": "long"
        },
        "participants": {
          "type": "nested",
          "include_in_root": true,
          "properties": {
            "id": {
              "type": "long"
            },
            "last_received_at": {
              "type": "date",
              "format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
            },
            "last_sent_at": {
              "type": "date",
              "format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
            },
            "name": {
              "type": "string",
              "analyzer": "standard"
            }
          }
        },
        "subject": {
          "properties": {
            "id": {
              "type": "long"
            },
            "name": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}'
curl -XPUT "localhost:9200/test-idx/message_thread/1" -d '{
  "id" : 1,
  "subject" : {"name": "Test Thread"},
  "participants" : [
    {"id" : 87793, "name" : "John Smith", "last_received_at" : null, "last_sent_at" : "2010-10-27T17:26:58Z"},
    {"id" : 3785, "name" : "David Jones", "last_received_at" : "2010-10-27T17:26:58Z", "last_sent_at" : null}
  ],
  "messages" : [{
    "id" : 1,
    "body" : "This is a test.",
    "sender" : { "id" : 87793 },
    "recipient" : { "id" : 3785},
    "created_at" : "2010-10-27T17:26:58Z"
  }]
}'
curl -XPUT "localhost:9200/test-idx/message_thread/2" -d '{
  "id" : 2,
  "subject" : {"name": "Elastic"},
  "participants" : [
    {"id" : 57834, "name" : "Paul Johnson", "last_received_at" : "2010-11-25T17:26:58Z", "last_sent_at" : "2010-10-25T17:26:58Z"},
    {"id" : 3785, "name" : "David Jones", "last_received_at" : "2010-10-25T17:26:58Z", "last_sent_at" : "2010-11-25T17:26:58Z"}
  ],
  "messages" : [{
    "id" : 2,
    "body" : "More testing of elasticsearch.",
    "sender" : { "id" : 57834 },
    "recipient" : { "id" : 3785},
    "created_at" : "2010-10-25T17:26:58Z"
  },{
    "id" : 3,
    "body" : "Reply message.",
    "sender" : { "id" : 3785 },
    "recipient" : { "id" : 57834},
    "created_at" : "2010-11-25T17:26:58Z"
  }]
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
# Using include in root
curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{
  "query": {
    "filtered": {
      "query": {
        "nested": {
          "path": "participants",
          "score_mode": "max",
          "query": {
            "custom_score": {
              "query": {
                "filtered": {
                  "query": {
                    "match_all": {}
                  },
                  "filter": {
                    "term": {
                      "participants.id": 3785
                    }
                  }
                }
              },
              "params": {
                "sort_column": "participants.last_received_at"
              },
              "script": "doc[sort_column].value"
            }
          }
        }
      },
      "filter": {
        "query": {
          "multi_match": {
            "query": "test",
            "fields": ["subject.name", "participants.name", "messages.body"],
            "operator": "and",
            "use_dis_max": true
          }
        }
      }
    }
  },
  "sort": ["_score"],
  "fields": []
}
'

# Not using include in root
curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{
  "query": {
    "filtered": {
      "query": {
        "nested": {
          "path": "participants",
          "score_mode": "max",
          "query": {
            "custom_score": {
              "query": {
                "filtered": {
                  "query": {
                    "match_all": {}
                  },
                  "filter": {
                    "term": {
                      "participants.id": 3785
                    }
                  }
                }
              },
              "params": {
                "sort_column": "participants.last_received_at"
              },
              "script": "doc[sort_column].value"
            }
          }
        }
      },
      "filter": {
        "query": {
          "bool": {
            "should": [{
              "match": {
                "subject.name":"test"
              }
            }, {
              "nested" : {
                "path": "participants",
                "query": {
                  "match": {
                    "name":"test"
                  }
                }
              }
            }, {
              "match": {
                "messages.body":"test"
              }
            }
            ]
          }
        }
      }
    }
  },
  "sort": ["_score"],
  "fields": []
}
'
4

1 回答 1

0

这里有几个问题。您在询问嵌套对象,但在您的映射中未将参与者定义为嵌套对象。第二个可能的问题是分数的类型为浮点数,因此它可能没有足够的精度来表示时间戳。如果您可以弄清楚如何将此值放入浮点数中,您可以看一下这个示例:弹性搜索 - 标记强度(嵌套/子文档提升)。但是,如果您正在开发一个新系统,升级到支持嵌套字段排序的 0.90.0.Beta1 可能是明智的。

于 2013-04-05T21:33:32.627 回答