python - 在弹性搜索中仅使用数字字段索引文档

Question

我正在尝试将对象存储在仅由数字字段表示的弹性搜索中。在我的例子中，每个对象都有 300 个浮点字段和 1 个 id 字段。我已将 id 字段设置为 not_analyzed。我能够将文档存储在 ES 中。

 "_index": "smart_content5",
    "_type": "doc2vec",
    "_id": "AVtAGeaZjLL5cvd8z9y7",
    "_score": 1,
    "_source": {
      "feature_227": 0.0856793,
      "feature_5": -0.115823,
      "feature_119": -0.0379987,
      "feature_145": 0.17952,
      "feature_29": 0.0444945,

但现在我想运行一个用相同的 300 个字段表示但数值不同的查询（当然）。现在我想找到其 300 个字段与此查询字段“最相似”的文档。所以这就像做余弦相似度，但我正在尝试使用 ES 来做这件事，以便它很快。

（1）首先，有没有可能做我正在做的事情？

(2) 其次，我探索了ES 的function_score功能并尝试使用它，但它返回最大匹配分数为 0.0！

关于我应该使用什么以及我在 [2] 中可能做错什么的任何评论。

score 1 · Accepted Answer

我认为你仍然需要function_score，但像这样（它对我有用）：

{
  "query": {
    "function_score": {
      "query": {},
      "functions": [
        {
          "gauss": {
            "feature_227": {
              "origin": "0",
              "scale": "0.5"
            }
          }
        },
        {
          "gauss": {
            "feature_5": {
              "origin": "0",
              "scale": "0.5"
            }
          }
        },
        {
          "gauss": {
            "feature_119": {
              "origin": "0",
              "scale": "0.5"
            }
          }
        },
        {
          "gauss": {
            "feature_145": {
              "origin": "0",
              "scale": "0.5"
            }
          }
        },
        {
          "gauss": {
            "feature_29": {
              "origin": "0",
              "scale": "0.5"
            }
          }
        }
      ],
      "score_mode": "sum"
    }
  }
}

python - 在弹性搜索中仅使用数字字段索引文档

1 回答 1

Related

Reference