0

我正在尝试使用 Lucene 4.4.0 索引大小接近 20 MB 的文件。现在基于 lucene 网站,索引过程消耗接近 1 MB 的堆。

但是我在我的应用服务器上部署了索引代码,而 javaopts 是

-Xms8192m -Xmx16384m -XX:MaxPermSize=512m

现在,只有1 个文件需要被索引,其大小为 20 MB

在索引期间,我得到的只是一个没有生成索引的锁定文件。发生以下错误并停止索引...

java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:148)
    at org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:128)
18:42:21,764 INFO     at org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:91)
18:42:21,765 INFO      at org.apache.lucene.document.Field$StringTokenStream.<init>(Field.java:568)
18:42:21,765 INFO      at org.apache.lucene.document.Field.tokenStream(Field.java:541)
18:42:21,765 INFO      at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
18:42:21,766 INFO      at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
18:42:21,766 INFO      at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
18:42:21,766 INFO      at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
18:42:21,767 INFO      at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
18:42:21,767 INFO      at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188)
18:42:21,767 INFO      at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169)
18:42:21,768 INFO      at com.rancore.MainClass1.indexDocs(MainClass1.java:197)
18:42:21,768 INFO      at com.rancore.MainClass1.indexDocs(MainClass1.java:153)
18:42:21,768 INFO      at com.rancore.MainClass1.main(MainClass1.java:95)
18:42:21,771 INFO  java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit
18:42:21,772 INFO      at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
18:42:21,911 INFO      at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
18:42:21,911 INFO    at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
18:42:21,912 INFO     at com.rancore.MainClass1.main(MainClass1.java:122)
18:42:22,008 INFO  Indexing to directory 

有人可以指导我问题似乎出在哪里...

索引代码片段:

static void indexDocs(IndexWriter writer, File file,boolean flag)
    throws IOException {

        FileInputStream fis = null;
   if (file.canRead()) {

      if (file.isDirectory()) {
       String[] files = file.list();
        // an IO error could occur
        if (files != null) {
          for (int i = 0; i<   files.length; i++) {
            indexDocs(writer, new File(file, files[i]),flag);
          }
        }
     } else {
        try {
          fis = new FileInputStream(file);
       } catch (FileNotFoundException fnfe) {

         fnfe.printStackTrace();
       }

        try {

            Document doc = new Document();

            Field pathField = new StringField("path", file.getPath(),
Field.Store.YES);
            doc.add(pathField);

            doc.add(new LongField("modified", file.lastModified(),
Field.Store.NO));

            doc.add(new
StringField("name",file.getName(),Field.Store.YES));

           doc.add(new TextField("contents", new BufferedReader(new
InputStreamReader(fis, "UTF-8"))));

            LineNumberReader lnr=new LineNumberReader(new
FileReader(file));


           String line=null;
            while( null != (line = lnr.readLine()) ){
                doc.add(new StringField("SC",line.trim(),Field.Store.YES));
               // doc.add(new Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
            }

            if (writer.getConfig().getOpenMode() ==
OpenMode.CREATE_OR_APPEND)
{

              writer.addDocument(doc);
              writer.commit();
              fis.close();
            } else {
                try
                {
              writer.updateDocument(new Term("path", file.getPath()),
doc);

              fis.close();

                }catch(Exception e)
                {
                    writer.close();
                     fis.close();

                    e.printStackTrace();

                }
            }

        }catch (Exception e) {
             writer.close();
              fis.close();

           e.printStackTrace();
        }finally {
            // writer.close();

          fis.close();
        }
      }
    }
}
} 

将文档添加到编写器时会出现此问题。

请指导..

还添加了 IndexWriter 代码:

public static void main(String[] args) {

        String indexPath = args[0];  //Place where indexes will be created
        String docsPath=args[1];    //Place where the files are kept.
        boolean create=true;


       final File docDir = new File(docsPath);
       if (!docDir.exists() || !docDir.canRead()) {
          System.out.println("Document directory '" +docDir.getAbsolutePath()+ "' does not exist or is not readable, please check the path");
          System.exit(1);
        }

        Date start = new Date();
       try {
          System.out.println("Indexing to directory FTP CODE ONLY '" + indexPath + "'..."+docsPath);

         Directory dir = FSDirectory.open(new File(indexPath));

         Analyzer analyzer=new CustomAnalyzerForCaseSensitive(Version.LUCENE_44);

         IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44, analyzer);

         iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);

          IndexWriter writer = new IndexWriter(dir, iwc);

          if(args[2].trim().equalsIgnoreCase("OverAll")){
              System.out.println("inside Over All");
              indexDocs(writer, docDir,true);
          }else{
              filenames=args[2].split(",");
              //indexDocs(writer, docDir);

       }
          writer.commit();
          writer.close();

          Date end = new Date();
         System.out.println(end.getTime() - start.getTime() + " total milliseconds");

        } catch (IOException e) {
          System.out.println(" caught a " + e.getClass() +
           "\n with message: " + e.getMessage());
        }
        catch(Exception e)
        {
            e.printStackTrace();
        }
     }
4

0 回答 0