我有一个 python 脚本来简单地将 unicode 语句索引到 lucene 索引中。它适用于 100 句和我的 1000 句试验。但是,当我需要索引 200,000 个句子时,我在第 4514 个句子出现合并错误,问题是什么,如何解决?
错误:_
Exception in thread "Thread-4543" org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: /home/alvas/europarl/index/_70g.tii (Too many open files)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:271)
Traceback (most recent call last):
Caused by: java.io.FileNotFoundException: /home/alvas/europarl/index/_70g.tii (Too many open files)
at java.io.RandomAccessFile.open(Native Method) File "indexer.py", line 183, in <module>
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:593)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:435)
at org.apache.lucene.index.TermInfosWriter.initialize(TermInfosWriter.java:91)
at org.apache.lucene.index.TermInfosWriter.<init>(TermInfosWriter.java:83)
at org.apache.lucene.index.TermInfosWriter.<init>(TermInfosWriter.java:77)
incrementalIndexing(sfile,tfile,indexDir)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:381)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134) File "indexer.py", line 141, in incrementalIndexing
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3109)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240)
writer.optimize(); writer.close()
lucene.JavaError: java.io.IOException: background merge hit exception: _70e:c4513 _70f:c1 into _70g [optimize]
Java stacktrace:
java.io.IOException: background merge hit exception: _70e:c4513 _70f:c1 into _70g [optimize]
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1749)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1689)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1669)
Caused by: java.io.FileNotFoundException: /home/alvas/europarl/index/_70g.tii (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:593)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:435)
at org.apache.lucene.index.TermInfosWriter.initialize(TermInfosWriter.java:91)
at org.apache.lucene.index.TermInfosWriter.<init>(TermInfosWriter.java:83)
at org.apache.lucene.index.TermInfosWriter.<init>(TermInfosWriter.java:77)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:381)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3109)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240)
我的代码: http: //pastebin.com/Ep133W5f
示例输入文件: http: //pastebin.com/r5qE4qpt , http://pastebin.com/wxCU277x