1

I have Lucene files indexed according to pageIds (UniqueKey). and one document can have multiple pages. Now once user perform some search it gives us pages that matches search criteria.

I am using Lucene.Net 2.9.2

We have 2 problems...

1- The file size is around 800GB and it has 130 million rows (pages) so the search time was really slow (all queries taking more than a min (we only have to return limited rows at a time)

To overcome the performance issue I shifted to SOLR which resolved the performance issue (which is quite strange as I am not using any extra functionality provided by SOLR like sharding etc - so could it be that Lucene.NET 2.9.2 is not really equivalent to performance compared to same version of JAVA??) but now I am having another issue...

2- The individual 'lucene document' is one page but i want to show results 'grouped by' 'real documents'. How many results I should be returned should be configurable based on 'real documents' not 'pages' (coz thats how I want to show to the user).

So lets say I want 20 'real documents' and ALL pages in them that matches the search criteria (doesnt matter if one document has 100 pages and another just 1).

From what I could get from SOLR forums was that it can be achieved by SOLR-236 patch (field collapsing) but I have not been able to apply the patch correctly with trunk (gives lots of errors).

This is really imp for me and I dont have much time, so can someone please either send me the SOLR 1.4.1 binary with this patch applied or guide me if there is any other way.

I would really appreciate it. Thanks!!

4

3 回答 3

0

你也可以看看SOLR-1682 : Implement CollapseComponent,我还没有测试过,但据我所知它也解决了折叠问题。

于 2010-12-17T10:41:52.217 回答
0

如果您有崩溃补丁的问题,那么 Solr 问题跟踪器就是报告问题的渠道。我可以看到其他人目前对它有一些问题,所以我建议参与它的开发。

也就是说:我建议如果您的应用程序需要搜索“真实文档”,那么围绕这些“真实文档”而不是它们的单个页面构建索引。

于 2010-08-12T13:40:27.403 回答
0

如果您唯一的要求是显示页码,我建议您使用荧光笔或进行一些自定义开发。您可以将每个页面的开头和结尾的单词编号存储在自定义结构中,并且知道匹配的单词在整个文档中的位置,您就可以知道它出现在哪个页面中。如果文档非常大,您将获得很好的性能提升。

于 2010-12-17T10:09:36.327 回答