elasticsearch - ElasticSearch 中的名称搜索

Question

我在 ElasticSearch 中创建了一个索引，其中存储了一个人的全名：姓名和姓氏。我想对该字段执行全文搜索，因此我使用分析器对其进行了索引。

我现在的问题是，如果我搜索：“John Rham Rham”

在我有“John Rham Rham Luck”的索引中，该值的得分高于“John Rham Rham”。与字符串中具有更多值的字段相比，是否有可能在确切字段上获得更好的分数？

提前致谢！

score 0 · Accepted Answer

我制定了一个小例子（假设你在 ES 5.x 上运行导致得分差异的原因）：

DELETE test
PUT test
{
  "settings": {
    "similarity": {
      "my_bm25": {
        "type": "BM25",
        "b": 0
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "text",
          "similarity": "my_bm25",
          "fields": {
            "length": {
              "type": "token_count",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

POST test/test/1
{
  "name": "John Rham Rham"
}
POST test/test/2
{
  "name": "John Rham Rham Luck"
}
GET test/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "name": {
            "query": "John Rham Rham",
            "operator": "and"
          }
        }
      },
      "functions": [
        {
          "script_score": {
            "script": "_score / doc['name.length'].getValue()"
          }
        }
      ]
    }
  }
}

此代码执行以下操作：

将默认的 BM25 实现替换为自定义实现，调整 B 参数（字段长度标准化）——您还可以将相似性更改为“经典”以返回没有此标准化的 TF/IDF
为您的名称字段创建一个内部字段，该字段计算您的名称字段中的标记数。
根据token的长度更新分数

这将导致：

"hits": {
    "total": 2,
    "max_score": 0.3596026,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 0.3596026,
        "_source": {
          "name": "John Rham Rham"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "2",
        "_score": 0.26970196,
        "_source": {
          "name": "John Rham Rham Luck"
        }
      }
    ]
  }
}

不确定这是否是最好的方法，但它可能会为您指明正确的方向:)

elasticsearch - ElasticSearch 中的名称搜索

1 回答 1

Related

Reference