2

我正在使用 lucene.Net 3.0.3 版。我想做正则表达式搜索。我尝试了以下代码:

// code

String SearchExpression = "[DM]ouglas";

const int hitsLimit = 1000000;

//state the file location of the index
string indexFileLocation          = IndexLocation;
Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.Open(indexFileLocation);

//create an index searcher that will perform the search
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(dir);

var analyzer = new WhitespaceAnalyzer();

var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, new[] {                 
            Field_Content, }, analyzer);

Term t = new Term(Field_Content, SearchExpression);
RegexQuery scriptQuery = new RegexQuery(t);

string s = string.Format("{0}", SearchExpression);

var query = parser.Parse(s);

BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.Add(query, Occur.MUST);

var hits = searcher.Search(booleanQuery, null, hitsLimit, Sort.RELEVANCE).ScoreDocs;


foreach (var hit in hits)
{
    var hitDocument = searcher.Doc(hit.Doc);

    string contentValue = hitDocument.Get(Field_Content);
}

// end of code

当我尝试使用 patten 进行搜索时"Do*uglas",我得到了结果。

但是,如果我使用该模式进行搜索,"[DM]ouglas]"则会出现以下错误:

"Cannot parse '[DM]ouglas': Encountered " "]" "] "" at line 1, column 3. Was expecting one of: "TO" ... <RANGEIN_QUOTED> ... <RANGEIN_GOOP> ...".

我也尝试过简单的搜索模式".ouglas",就像我"Douglas"在文本内容中所做的那样,它应该会给我结果。

有谁知道如何使用 lucene.Net 3.0.3 版进行正则表达式搜索?

4

1 回答 1

3

StandardQueryParser 根本不支持正则表达式。相反,它试图将查询的该部分解释为范围查询。

如果您希望使用正则表达式进行搜索,则需要RegexQuery手动构建一个。请注意,RegexQuery性能往往很差。您可以通过从 切换到 来改进JavaUtilRegexCapabilitiesJakartaRegexpCapabilities

于 2013-07-18T16:29:14.650 回答