6

我有一个带有完全匹配字段的弹性搜索索引,不知何故我得到了很多相似的结果(我不介意),并且这些相似的结果在完全匹配之前排序,(我确实介意。)

有人可以解释发生了什么以及如何解决它吗?

我的映射是这样的

"exact":{
  "type":"string",
  "boost":10.0,
  "analyzer":"keyword"
},

我搜索“AAPL P JAN 2014 885,00”的查询是这样的:

{
  "size" : 21,
  "query" : {
    "field" : {
      "exact" : "AAPL P JAN 2014 885,00"
    }
  },
  "explain" : true,
  "sort" : [ {
    "_score" : {
      "order" : "desc"
    }
  } ],
  "facets" : {
    "category" : {
      "terms" : {
        "field" : "category",
        "size" : 10
      }
    }
  }
}

返回的文件按以下顺序结束:

  • {"精确":["APPLE INC","US0378331005","AAPL","73773"],"id-compound":"AAPL"}
  • {"exact":["AAPL","73773","AAPL P JAN 2014 675,00"],"id-compound":"AAPL*PUT*20140118*675"}
  • {"exact":["AAPL","73773","AAPL C JAN 2014 500,00"],"id-compound":"AAPL*CALL*20140118*500"}

等等,完全匹配一堆结果。

有人可以向我解释为什么完全匹配不会在顶部结束吗?

如果它有助于理解事物,则带有完整解释的搜索结果如下。

"hits" : [ {
  "_shard" : 0,
  "_node" : "1",
  "_index" : "instruments",
  "_type" : "instrument",
  "_id" : "AAPL",
  "_score" : 1306.8339, "_source" : {"exact":["APPLE INC","US0378331005","AAPL","73773"],"id-compound":"AAPL"},
  "_explanation" : {
    "value" : 1306.8339,
    "description" : "product of:",
    "details" : [ {
      "value" : 6534.169,
      "description" : "sum of:",
      "details" : [ {
        "value" : 6534.169,
        "description" : "weight(exact:AAPL in 9096), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 25272.875,
          "description" : "fieldWeight(exact:AAPL in 9096), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 4096.0,
            "description" : "fieldNorm(field=exact, doc=9096)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}, {
  "_shard" : 0,
  "_node" : "1",
  "_index" : "instruments",
  "_type" : "instrument",
  "_id" : "AAPL*PUT*20140118*675",
  "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 675,00"],"id-compound":"AAPL*PUT*20140118*675"},
  "_explanation" : {
    "value" : 163.35423,
    "description" : "product of:",
    "details" : [ {
      "value" : 816.7711,
      "description" : "sum of:",
      "details" : [ {
        "value" : 816.7711,
        "description" : "weight(exact:AAPL in 18), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 3159.1094,
          "description" : "fieldWeight(exact:AAPL in 18), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 512.0,
            "description" : "fieldNorm(field=exact, doc=18)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}, {
  "_shard" : 0,
  "_node" : "1",
  "_index" : "instruments",
  "_type" : "instrument",
  "_id" : "AAPL*CALL*20140118*500",
  "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL C JAN 2014 500,00"],"id-compound":"AAPL*CALL*20140118*500"},
  "_explanation" : {
    "value" : 163.35423,
    "description" : "product of:",
    "details" : [ {
      "value" : 816.7711,
      "description" : "sum of:",
      "details" : [ {
        "value" : 816.7711,
        "description" : "weight(exact:AAPL in 383), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 3159.1094,
          "description" : "fieldWeight(exact:AAPL in 383), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 512.0,
            "description" : "fieldNorm(field=exact, doc=383)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}, {
  "_id" : "AAPL*PUT*20140118*940",
  "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 940,00"],"id-compound":"AAPL*PUT*20140118*940"},
  "_explanation" : {
    "value" : 163.35423,
    "description" : "product of:",
    "details" : [ {
      "value" : 816.7711,
      "description" : "sum of:",
      "details" : [ {
        "value" : 816.7711,
        "description" : "weight(exact:AAPL in 794), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 3159.1094,
          "description" : "fieldWeight(exact:AAPL in 794), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 512.0,
            "description" : "fieldNorm(field=exact, doc=794)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}

并且以防万一如果我分析我要存储的数据会发生什么:

curl -XGET 'localhost:9200/instruments/_analyze?field=exact&pretty=true' -d 'ING  P JUN 2013 6.00'
{
  "tokens" : [ {
    "token" : "ING  P JUN 2013 6.00",
    "start_offset" : 0,
    "end_offset" : 20,
    "type" : "word",
    "position" : 1
  } ]
4

5 回答 5

2

我不确定这在技术上是否是最好的,但如果您只是在弹性搜索中获得一个特定答案后,您可以使用带有脚本的过滤器来寻找完全匹配的脚本。

{
  from : 0,
  size : 1,
  "query" : { 
    "text_phrase" : { 
      "title" : "AAPL P JAN 2014 885,00"
    } 
  },
  "filter" : { 
    "script" : { 
      "script" : "_source.exact.contains(x)", 
      "params" : { 
        "x" : "AAPL P JAN 2014 885,00" 
      }  
    } 
  }
}

我已经使用它从弹性搜索中获取单个已知条目,它对我来说效果很好。

于 2013-07-23T16:29:34.730 回答
1

我想你已经找到了答案,只是想为其他有同样问题的人提供更多信息。

您使用field来自 elasticsearch 文档的查询:

字段查询:

针对特定字段执行查询字符串的查询。它是 query_string 查询的简化版本(通过将 default_field 设置为此查询所针对的字段)。

我相信query_string查询是针对文本的,即:它对查询做了很多,使它变得模糊等等......

您想要使用的(我认为您发现了这一点)是一个term不会对搜索短语做任何事情的查询,因此只会给您完全匹配。

注意:分析发生在 2 个不同的时间,即索引时间和查询时间。设置"analyzer": "keyword"似乎只影响“使用查询字符串搜索时”形式的搜索时间查询 elasticsearch docs。我必须承认我不知道这到底是什么意思(我猜query_string,但它也可能意味着像这样的搜索http://../_search?q=exact:{query here}

于 2013-06-05T10:29:49.930 回答
1

您不应该分析您的 id 字段。

将您的字段定义为:

"exact":{
   "type":"string",
   "index":"not_analyzed"
 }

看看找到精确值

于 2015-10-12T09:32:10.050 回答
0

所有三个文档都得到完全相同的分数,正如您从解释输出中看到的那样,它们都在“AAPL”上匹配。该术语始终在文档中出现一次 (tf=1),并且出现在 37299 个文档中的 211 个 (idf=6.1701355) 中。字段规范要高得多,因为您使用的是索引时间提升(映射中的提升部分,10),反正没什么大不了的,因为匹配总是在同一个字段上。只是如果你在其他领域有比赛,exact 几乎总是会赢,这在你的情况下可能是有意义的。

但问题是,AAPL P JAN 2014 885,00如果我查看您的文件,这并不完全匹配。我所看到的是,在您的查询中的 5 个术语中只有一个匹配,您的解释输出中的 coord 也证实了这一点:coord(1/5)`。

keyword分析器似乎已应用,但正如您从返回的文档中看到的那样,您不是将字段内容作为exact单个值发送,而是作为值数组发送。它的每个项目都不会被标记化,因为您正在使用keyword分析器,但您仍然有多个标记。我想你必须检查你是如何索引文档的。

于 2013-05-16T17:32:47.603 回答
0

您的关键字分析器在搜索查询中似乎被忽略的原因是因为 ES 对这个字符串进行了两次标记 - 首先它运行它的 DSL 标记器,然后它运行在 rezult 上的映射中指定的标记器。这在这篇文章http://paulsabou.com/blog/2012/03/25/advanced-exact-matching-with-elastic-search/中有更详细的解释

于 2013-10-29T12:01:36.347 回答