1

我在我的文档模型中实现了函数 score 属性,其中包含一个单击字段,该字段跟踪每个文档的视图数量。现在我希望搜索结果获得更高的优先级并根据每次搜索的点击次数显示在顶部

我的 document.rb 代码

require 'elasticsearch/model'



 def self.search(query)
  __elasticsearch__.search(
    {
      query: {
        function_score: {
          query: {
            multi_match: {
              query: query,
              fields: ['name', 'service'],
              fuzziness: "AUTO"
            }
          },
          field_value_factor: {
            field: 'clicks',
            modifier: 'log1p',
            factor: 2 
          }
        }
      }
    }
  )
 end

 settings index: { "number_of_shards": 1, 
  analysis: {
    analyzer: {
      edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter: 
                       ["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
        }
    },
    filter: { ascii_folding: { type: 'asciifolding', preserve_original: true
                             }, 
              edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
                              "20" } 
  }
 } do
  mapping do
    indexes :name, type: "string", analyzer: "edge_ngram_analyzer", 
             term_vector: "with_positions"
    indexes :service, type: "string", analyzer: "edge_ngram_analyzer", 
             term_vector: "with_positions"
  end 
 end

end

搜索视图在这里

<h1>Document Search</h1>

 <%= form_for search_path, method: :get do |f| %>
 <p>
  <%= f.label "Search for" %>
  <%= text_field_tag :query, params[:query] %>
  <%= submit_tag "Go", name: nil %>
 </p>
<% end %>
<% if @documents %>
  <ul class="search_results">
    <% @documents.each do |document| %>
    <li>
       <h3>
          <%= link_to document.name, controller: "documents", action: "show", 
         id: document._id %>   
       </h3>   
   </li>
   <% end %>
 </ul>
<% else %>
 <p>Your search did not match any documents.</p>
<% end %>
 <br/>

当我搜索 Estamp 时,我得到的结果按以下顺序排列:

 Franking and Estamp # clicks 5
 Notary and Estamp   #clicks 8

很明显,当 Notary 和 Estamp 获得更多点击时,它并没有出现在搜索的顶部。我怎样才能做到这一点?

这是我在控制台上运行它时得到的。

POST _search

      "hits": {
       "total": 2,
       "max_score": 1.322861,
       "hits": [
             {
              "_index": "documents",
              "_type": "document",
              "_id": "13",
              "_score": 1.322861,
              "_source": {
                 "id": 13,
                 "name": "Franking and Estamp",
                 "service": "Estamp",
                 "user_id": 1,         
                 "clicks": 7
              },
           {
              "_index": "documents",
              "_type": "document",
              "_id": "14",
              "_score": 0.29015404,
              "_source": {
                "id": 14,
                "name": "Notary and Estamp",
                "service": "Notary",
                "user_id": 1,
                "clicks": 12
         }
       }
     ]

这里文档的分数没有根据点击更新

4

1 回答 1

1

如果没有看到您的索引数据,就不容易回答。但是看着查询我想到了一件事,我将用一个简短的例子来展示它:

示例 1:

我已经索引了以下文档:

{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"Notary and Estamp", "service" :"text", "clicks": 8}

运行您提供的相同查询给出了以下结果:

"hits": {
    "total": 2,
    "max_score": 4.333119,
    "hits": [
        {
            "_index": "script",
            "_type": "test",
            "_id": "AV2iwkems7jEvHyvnccV",
            "_score": 4.333119,
            "_source": {
                "name": "Notary and Estamp",
                "service": "text",
                "clicks": 8
            }
        },
        {
            "_index": "script",
            "_type": "test",
            "_id": "AV2iwo6ds7jEvHyvnccW",
            "_score": 3.6673431,
            "_source": {
                "name": "Franking and Estampy",
                "service": "text",
                "clicks": 5
            }
        }
    ]
}

所以一切都很好 - 单击 8 次的文档得分更高(_score字段值)并且顺序正确。

示例 2:

我在您的查询中注意到该name字段以高因子提升。那么,如果我将以下数据编入索引,会发生什么?

{"name":"Franking and Estampy", "service" :"text", "clicks": 5}
{"name":"text", "service" :"Notary and Estamp", "clicks": 8}

结果:

"hits": {
    "total": 2,
    "max_score": 13.647502,
    "hits": [
        {
            "_index": "script",
            "_type": "test",
            "_id": "AV2iwo6ds7jEvHyvnccW",
            "_score": 13.647502,
            "_source": {
                "name": "Franking and Estampy",
                "service": "text",
                "clicks": 5
            }
        },
        {
            "_index": "script",
            "_type": "test",
            "_id": "AV2iwkems7jEvHyvnccV",
            "_score": 1.5597181,
            "_source": {
                "name": "text",
                "service": "Notary and Estamp",
                "clicks": 8
            }
        }
    ]
}

虽然Franking and Estampy只有 5 次点击,但它的得分远高于点击次数更多的第二个文档。

所以重点是,在您的查询中,点击次数并不是影响评分和文档最终顺序的唯一因素。没有真实数据,这只是我的猜测。您可以使用一些 REST 客户端自己运行查询并检查评分/字段/匹配短语。

更新

根据您的搜索结果 - 您可以看到该文档在两个字段(和)中id=13都有术语。这就是为什么该文档获得更高评分的原因(这意味着在计算评分的算法中,在两个字段中都有这个词比拥有更高的点击次数更重要)。如果您希望字段对得分产生更大的影响,请尝试尝试(可能应该更高)和(可能适用于您的情况)。您可以在此处检查可能的值。Estampnameserviceclicksfactormodifier"modifier": "square"

试试这个组合:

{
  "query": {
    "function_score": { 
      ... // same as before
      },
      "field_value_factor": { 
        "field": "clicks" ,
        "modifier": "square",
        "factor": 3 
      }
    }
  }
}

更新 2 - 仅根据点击次数评分

如果唯一应该影响评分的参数应该是clicks字段中的值,您可以尝试使用"boost_mode": "replace"- 在这种情况下只使用函数分数,查询分数被忽略。因此,在和字段中的Estamp术语频率不会对评分产生影响。试试这个查询:nameservice

{
  "query": {
    "function_score": { 
      "query": { 
        "multi_match": {
          "query":    "Estamp",
          "fields": [ "name", "service"],
          "fuzziness": "AUTO"
        }
      },
      "field_value_factor": { 
        "field": "clicks",
        "factor": 1
      },
      "boost_mode": "replace"
    }
  }
}

它给了我:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 5,
        "hits": [
            {
                "_index": "script",
                "_type": "test",
                "_id": "AV2nI0HkJPYn0YKQxRvd",
                "_score": 5,
                "_source": {
                    "name": "Notary and Estamp",
                    "service": "Notary",
                    "clicks": 5
                }
            },
            {
                "_index": "script",
                "_type": "test",
                "_id": "AV2nIwKvJPYn0YKQxRvc",
                "_score": 4,
                "_source": {
                    "name": "Franking and Estamp",
                    "service": "Estamp",
                    "clicks": 4
                }
            }
        ]
    }
}

这可能是您正在寻找的那个(注意值"_score": 5"_score": 4匹配点击次数)。

于 2017-08-02T12:26:07.980 回答