0

我需要一个关于如何使用 BooleanQuery(或另一种更有效的方式)编写从 Lucene 索引中删除文档的高效查询的建议 - 查询应该将多个术语与 Guid 值结合起来(通过“Guid”字段删除文档)以及通过“版本”字段..

索引可能包含在“Guid”字段中具有相同值但在“Version”字段中具有不同值的文档。

以下是我的功能:

private void RemoveFromIndex(string[] guids, IndexWriter writer)
        {
            var terms = guids.Select(guid => new Term("Guid", guid)).ToArray();

            if (!isGlobalIndex)
            {
                writer.DeleteDocuments(terms); //This is working perfectly
            }
            else
            {
                //Delete items but only of the corresponding version
                BooleanQuery bQ = new BooleanQuery();

                if (!string.IsNullOrEmpty(repository.versionName))
                {
                    bQ.Add(new TermQuery(new Term("Version", repository.versionName)), Occur.MUST);
                }

            //Is there a more efficient way of doing it?
            foreach (var term in terms) {
                bQ.Add(new TermQuery(term), Occur.SHOULD);
            }

                writer.DeleteDocuments(bQ);
            }
        }
4

1 回答 1

2

Nope, that's really the most efficient way to accomplish what you describe. For the 2nd part of your code (the outwards "else" clause), the Lucene query would be something like:

+GUID:someGuidValue version:v1 version:v2 version:v3

You can print the query object (or debug it) to confirm that that's really the Lucene query it creates for you. If yes, that's really the simplest way to do this.

==Update as per comment:==

First off, I'm not very sure what Lucene API you're using. I'm mostly familiar with the Java API. In the Java API you can configure the max boolean clauses on your boolean query, something like this:

BooleanQuery bq = new BooleanQuery();
bq.setMaxClauseCount(3000);

This should help you avoid having to move the query inside the while loop.

Also, as far as multi-term queries are concerned, there is a base (abstract) MultiTermQuery class, and some actual implementations of it, such as : FuzzyQuery, NumericRangeQueriy, RegexQuery, etc. These are for more specialized/exotic queries involving multiple terms. For simple query conditions such as these ones, BooleanQuery works fine.

于 2013-10-16T14:47:52.620 回答