我正在尝试使用 Lucene 4.4.0 索引大小接近 20 MB 的文件。现在基于 lucene 网站,索引过程消耗接近 1 MB 的堆。
但是我在我的应用服务器上部署了索引代码,而 javaopts 是
-Xms8192m -Xmx16384m -XX:MaxPermSize=512m
现在,只有1 个文件需要被索引,其大小为 20 MB
在索引期间,我得到的只是一个没有生成索引的锁定文件。发生以下错误并停止索引...
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:148)
at org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:128)
18:42:21,764 INFO at org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:91)
18:42:21,765 INFO at org.apache.lucene.document.Field$StringTokenStream.<init>(Field.java:568)
18:42:21,765 INFO at org.apache.lucene.document.Field.tokenStream(Field.java:541)
18:42:21,765 INFO at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
18:42:21,766 INFO at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188)
18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169)
18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:197)
18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:153)
18:42:21,768 INFO at com.rancore.MainClass1.main(MainClass1.java:95)
18:42:21,771 INFO java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit
18:42:21,772 INFO at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
18:42:21,911 INFO at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
18:42:21,911 INFO at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
18:42:21,912 INFO at com.rancore.MainClass1.main(MainClass1.java:122)
18:42:22,008 INFO Indexing to directory
有人可以指导我问题似乎出在哪里...
索引代码片段:
static void indexDocs(IndexWriter writer, File file,boolean flag)
throws IOException {
FileInputStream fis = null;
if (file.canRead()) {
if (file.isDirectory()) {
String[] files = file.list();
// an IO error could occur
if (files != null) {
for (int i = 0; i< files.length; i++) {
indexDocs(writer, new File(file, files[i]),flag);
}
}
} else {
try {
fis = new FileInputStream(file);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
}
try {
Document doc = new Document();
Field pathField = new StringField("path", file.getPath(),
Field.Store.YES);
doc.add(pathField);
doc.add(new LongField("modified", file.lastModified(),
Field.Store.NO));
doc.add(new
StringField("name",file.getName(),Field.Store.YES));
doc.add(new TextField("contents", new BufferedReader(new
InputStreamReader(fis, "UTF-8"))));
LineNumberReader lnr=new LineNumberReader(new
FileReader(file));
String line=null;
while( null != (line = lnr.readLine()) ){
doc.add(new StringField("SC",line.trim(),Field.Store.YES));
// doc.add(new Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
}
if (writer.getConfig().getOpenMode() ==
OpenMode.CREATE_OR_APPEND)
{
writer.addDocument(doc);
writer.commit();
fis.close();
} else {
try
{
writer.updateDocument(new Term("path", file.getPath()),
doc);
fis.close();
}catch(Exception e)
{
writer.close();
fis.close();
e.printStackTrace();
}
}
}catch (Exception e) {
writer.close();
fis.close();
e.printStackTrace();
}finally {
// writer.close();
fis.close();
}
}
}
}
}
将文档添加到编写器时会出现此问题。
请指导..
还添加了 IndexWriter 代码:
public static void main(String[] args) {
String indexPath = args[0]; //Place where indexes will be created
String docsPath=args[1]; //Place where the files are kept.
boolean create=true;
final File docDir = new File(docsPath);
if (!docDir.exists() || !docDir.canRead()) {
System.out.println("Document directory '" +docDir.getAbsolutePath()+ "' does not exist or is not readable, please check the path");
System.exit(1);
}
Date start = new Date();
try {
System.out.println("Indexing to directory FTP CODE ONLY '" + indexPath + "'..."+docsPath);
Directory dir = FSDirectory.open(new File(indexPath));
Analyzer analyzer=new CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44, analyzer);
iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter writer = new IndexWriter(dir, iwc);
if(args[2].trim().equalsIgnoreCase("OverAll")){
System.out.println("inside Over All");
indexDocs(writer, docDir,true);
}else{
filenames=args[2].split(",");
//indexDocs(writer, docDir);
}
writer.commit();
writer.close();
Date end = new Date();
System.out.println(end.getTime() - start.getTime() + " total milliseconds");
} catch (IOException e) {
System.out.println(" caught a " + e.getClass() +
"\n with message: " + e.getMessage());
}
catch(Exception e)
{
e.printStackTrace();
}
}