我们正在尝试复制这个 ES 插件https://github.com/MLnick/elasticsearch-vector-scoring。原因是 AWS ES 不允许安装任何自定义插件。该插件只是在做点积和余弦相似度,所以我猜想在painless
脚本中复制它应该很简单。看起来groovy
脚本在 5.0 中已弃用。
这是插件的源代码。
/**
* @param params index that a scored are placed in this parameter. Initialize them here.
*/
@SuppressWarnings("unchecked")
private PayloadVectorScoreScript(Map<String, Object> params) {
params.entrySet();
// get field to score
field = (String) params.get("field");
// get query vector
vector = (List<Double>) params.get("vector");
// cosine flag
Object cosineParam = params.get("cosine");
if (cosineParam != null) {
cosine = (boolean) cosineParam;
}
if (field == null || vector == null) {
throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": field or vector parameter missing!");
}
// init index
index = new ArrayList<>(vector.size());
for (int i = 0; i < vector.size(); i++) {
index.add(String.valueOf(i));
}
if (vector.size() != index.size()) {
throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": index and vector array must have same length!");
}
if (cosine) {
// compute query vector norm once
for (double v: vector) {
queryVectorNorm += Math.pow(v, 2.0);
}
}
}
@Override
public Object run() {
float score = 0;
// first, get the ShardTerms object for the field.
IndexField indexField = this.indexLookup().get(field);
double docVectorNorm = 0.0f;
for (int i = 0; i < index.size(); i++) {
// get the vector value stored in the term payload
IndexFieldTerm indexTermField = indexField.get(index.get(i), IndexLookup.FLAG_PAYLOADS);
float payload = 0f;
if (indexTermField != null) {
Iterator<TermPosition> iter = indexTermField.iterator();
if (iter.hasNext()) {
payload = iter.next().payloadAsFloat(0f);
if (cosine) {
// doc vector norm
docVectorNorm += Math.pow(payload, 2.0);
}
}
}
// dot product
score += payload * vector.get(i);
}
if (cosine) {
// cosine similarity score
if (docVectorNorm == 0 || queryVectorNorm == 0) return 0f;
return score / (Math.sqrt(docVectorNorm) * Math.sqrt(queryVectorNorm));
} else {
// dot product score
return score;
}
}
我试图从从索引中获取一个字段开始。但我遇到了错误。
这是我的索引的形状。
我已启用delimited_payload_filter
"settings" : {
"analysis": {
"analyzer": {
"payload_analyzer": {
"type": "custom",
"tokenizer":"whitespace",
"filter":"delimited_payload_filter"
}
}
}
}
我有一个称为@model_factor
存储向量的字段。
{
"movies" : {
"properties" : {
"@model_factor": {
"type": "text",
"term_vector": "with_positions_offsets_payloads",
"analyzer" : "payload_analyzer"
}
}
}
}
这是文件的形状
{
"@model_factor":"0|1.2 1|0.1 2|0.4 3|-0.2 4|0.3",
"name": "Test 1"
}
这是我使用脚本的方式
{
"query": {
"function_score": {
"query" : {
"query_string": {
"query": "*"
}
},
"script_score": {
"script": {
"inline": "def termInfo = doc['_index']['@model_factor'].get('1', 4);",
"lang": "painless",
"params": {
"field": "@model_factor",
"vector": [0.1,2.3,-1.6,0.7,-1.3],
"cosine" : true
}
}
},
"boost_mode": "replace"
}
}
}
这是我得到的错误。
"failures": [
{
"shard": 2,
"index": "test",
"node": "ShL2G7B_Q_CMII5OvuFJNQ",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"caused_by": {
"type": "wrong_method_type_exception",
"reason": "wrong_method_type_exception: cannot convert MethodHandle(List,int)int to (Object,String)String"
},
"script_stack": [
"termInfo = doc['_index']['@model_factor'].get('1',4);",
" ^---- HERE"
],
"script": "def termInfo = doc['_index']['@model_factor'].get('1',4);",
"lang": "painless"
}
}
]
问题是如何访问索引字段以@model_factor
进行无痛脚本?