lucene.net - 如何对 Lucene.Net 搜索的结果进行分组？

Question

我已经设法创建文档并进行了一些复杂的搜索，但是在对一些搜索结果进行分组时遇到了问题。

搜索后显示的书籍很好。随着这个Author分组计数需要完成，这将基于相同的搜索查询。

例子，

Author Name      | Count
A                | 12
B                | 2

我正在使用不支持分组的 Lucene.Net 3.0.3.0，但可能有一些解决方法。我也需要价格范围相同的功能。

score 2 · Accepted Answer

如果您编写自定义Collector ，一切皆有可能。您描述的是方面，可以通过自己计算文档值来轻松解决。核心部分是调用IndexSearcher.Search重载接受收集器。收集器应该读取值，通常使用字段缓存实现并进行所需的计算。

这是一个简短的演示，使用了我的演示项目Corelicious.Lucene中的一些类。

var postTypes = new Dictionary<Int32, Int32>();
searcher.Search(query, new DelegatingCollector((reader, doc, scorer) => {
    var score = scorer.Score();
    if (score > 0) {
        var postType = SingleFieldCache.Default.GetInt32(reader, "PostTypeId", doc);
        if (postType.HasValue) {
            if (postTypes.ContainsKey(postType.Value)) {
                postTypes[postType.Value]++;
            } else {
                postTypes[postType.Value] = 1;
            }
        }
    }
}));

完整代码：

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
using System.Xml;
using Corelicious.Lucene;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Directory = Lucene.Net.Store.Directory;
using Version = Lucene.Net.Util.Version;

namespace ConsoleApplication {
    public static class Program {
        public static void Main(string[] args) {
            Console.WriteLine ("Creating directory...");
            var directory = new RAMDirectory();
            var analyzer = new StandardAnalyzer(Version.LUCENE_30);
            CreateIndex(directory, analyzer);

            var userQuery = "calculate pi";
            var queryParser = new QueryParser(Version.LUCENE_30, "Body", analyzer);
            var query = queryParser.Parse(userQuery);
            Console.WriteLine("Query: '{0}'", query);

            var indexReader = IndexReader.Open(directory, readOnly: true);
            var searcher = new IndexSearcher(indexReader);

            var postTypes = new Dictionary<Int32, Int32>();
            searcher.Search(query, new DelegatingCollector((reader, doc, scorer) => {
                var score = scorer.Score();
                if (score > 0) {
                    var postType = SingleFieldCache.Default.GetInt32(reader, "PostTypeId", doc);
                    if (postType.HasValue) {
                        if (postTypes.ContainsKey(postType.Value)) {
                            postTypes[postType.Value]++;
                        } else {
                            postTypes[postType.Value] = 1;
                        }
                    }
                }
            }));

            Console.WriteLine("Post type summary");
            Console.WriteLine("Post type  | Count");

            foreach(var pair in postTypes.OrderByDescending(x => x.Value)) {
                var postType = (PostType)pair.Key;
                Console.WriteLine("{0,-10} | {1}", postType, pair.Value);
            }

            Console.ReadLine ();
        }

        public enum PostType {
            Question = 1,
            Answer = 2,
            Tag = 4
        }

        public static void CreateIndex(Directory directory, Analyzer analyzer) {
            using (var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
            using (var xmlStream = File.OpenRead("/Users/sisve/Downloads/Stack Exchange Data Dump - Sept 2011/Content/092011 Mathematics/posts.xml"))
            using (var xmlReader = XmlReader.Create(xmlStream)) {
                while (xmlReader.ReadToFollowing("row")) {
                    var tags = xmlReader.GetAttribute("Tags") ?? String.Empty;
                    var title = xmlReader.GetAttribute("Title") ?? String.Empty;
                    var body = xmlReader.GetAttribute("Body");

                    var doc = new Document();

                    // tags are stored as <tag1><tag2>
                    foreach (Match match in Regex.Matches(tags, "<(.*?)>")) {
                        doc.Add(new Field("Tags", match.Groups[1].Value, Field.Store.NO, Field.Index.NOT_ANALYZED));
                    }

                    doc.Add(new Field("Title", title, Field.Store.NO, Field.Index.ANALYZED));
                    doc.Add(new Field("Body", body, Field.Store.NO, Field.Index.ANALYZED));
                    doc.Add(new Field("PostTypeId", xmlReader.GetAttribute("PostTypeId"), Field.Store.NO, Field.Index.NOT_ANALYZED));

                    writer.AddDocument(doc);
                }

                writer.Optimize();
                writer.Commit();
            }
        }
    }
}

lucene.net - 如何对 Lucene.Net 搜索的结果进行分组？

1 回答 1

Related

Reference