15

I would like to implement relevance feedback in Solr. Solr already has a More Like This feature: Given a single document, return a set of similar documents ranked by similarity to the single input document. Is it possible to configure Solr's More Like This feature to behave like More Like Those? In other words: Given a set of documents, return a list of documents similar to the input set (ranked by similarity).

According to the answer to this question turning Solr's More Like This into More Like Those can be done in the following way:

  1. Take the url of the result set of the query returning the specified documents. For example, the url http://solrServer:8983/solr/select?q=id:1%20id:2%20id:3 returns the response to the query id:1 id:2 id:3 which is practically the concatenation of documents 1, 2, 3.
  2. Put the above url (concatenation of the specified documents) in the url.stream GET parameter of the More Like This handler: http://solrServer:8983/solr/mlt?mlt.fl=text&mlt.mintf=0&stream.url=http://solrServer:8983/solr/select%3Fq=id:1%20id:2%20id:3. Now the More Like This handler treats the concatenation of documents 1, 2 and 3 as a single input document and returns a ranked set of documents similar to the concatenation.

This is a pretty bad implementation: Treating the set of input documents like one big document discriminates against short documents because short documents occupy a small portion of the entire big document.

Solr's More Like This feature is implemented by a variation of The Rocchio Algorithm: It takes the top 20 terms of the (single) input document (the terms with the highest TF-IDF values) and uses those terms as the modified query, boosted according to their TF-IDF. I am looking for a way to configure Solr's More Like This feature to take multiple documents as its input, extract the top n terms from each input document and query the index with those terms boosted according to their TF-IDF.

Is it possible to configure More Like This to behave that way? If not, what is the best way to implement relevance feedback in Solr?

4

1 回答 1

2

不幸的是,无法以这种方式配置 MLT 处理程序。

一种方法是实现自定义SearchComponent并将其注册到(专用)SearchHadler

我已经做过类似的事情,如果你看一下MLT 组件的原始实现,这很容易。

最困难的部分是来自不同分片服务器的结果的同步,但如果您不使用分片,则可以跳过它。

我还强烈建议在您的实现中使用您自己的参数,以防止与其他组件发生冲突。

于 2013-06-28T10:40:50.407 回答