我们使用 lucene.net 2.0 版 dll 在文档中进行搜索。一旦我们发布了文档,文档的内容就会被传递给 lucene 用于索引。一切正常。但是现在当我们发布另一个文档时,它会抛出以下错误:
System.IO.IOException: read past EOF
at Lucene.Net.Store.BufferedIndexInput.Refill()
at Lucene.Net.Store.BufferedIndexInput.ReadByte()
at Lucene.Net.Store.IndexInput.ReadInt()
at Lucene.Net.Index.IndexWriter.ReadDeleteableFiles()
at Lucene.Net.Index.IndexWriter.DeleteSegments(ArrayList segments)
at Lucene.Net.Index.IndexWriter.MergeSegments(Int32 minSegment, Int32 end)
at Lucene.Net.Index.IndexWriter.FlushRamSegments()
at Lucene.Net.Index.IndexWriter.Optimize()
问题是我们无法删除 lucene 创建的文件,因为有成千上万的文档需要再次发布才能重新创建索引。请提出解决方案和/或我们收到此错误的可能原因?
Analyzer analyzer = new StandardAnalyzer();
Lucene.Net.Store.Directory directory = FSDirectory.GetDirectory(lucenePath, false);
try
{
IndexReader ir = IndexReader.Open(lucenePath);
ir.DeleteDocuments(new Term("id", document.Lang + "-" + document.IDDoc));
ir.Close();
}
catch (Exception) { }
IndexWriter iwriter;
try
{
iwriter = new IndexWriter(directory, analyzer, false);
}
catch (Exception)
{
iwriter = new IndexWriter(directory, analyzer, true);
}
iwriter.SetMaxFieldLength(25000);
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
doc.Add(new Lucene.Net.Documents.Field("content", fulltext, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.TOKENIZED));
doc.Add(new Lucene.Net.Documents.Field("title", document.DocName, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.TOKENIZED));
doc.Add(new Lucene.Net.Documents.Field("id", document.Lang + "-" + document.IDDoc, Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.UN_TOKENIZED));
iwriter.AddDocument(doc);
iwriter.Optimize();
iwriter.Close();
directory.Close();