java - Lucene Search 保存文件内容时不返回结果

Question

我正在尝试使用 apache lucene 开发一个日志查询系统。我开发了一个演示代码来索引两个文件，然后搜索查询字符串。

第一个文件包含数据maclean

第二个文件包含数据 pinto

波纹管是我用于索引的代码

 fis = new FileInputStream(file);
  DataInputStream in = new DataInputStream(fis);
  BufferedReader br = new BufferedReader(new InputStreamReader(in));
  String strLine;
  Document doc = new Document();

  Document doc = new Document();
    doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8"))));

    doc.add(new StoredField("filename", file.getCanonicalPath()));

    if (indexWriter.getConfig().getOpenMode() == OpenMode.CREATE) {

       System.out.println("adding " + file);
      indexWriter.addDocument(doc);
   } else {

        System.out.println("updating " + file);
        indexWriter.updateDocument(new Term("path", file.getPath()), doc);
      }

如果我使用此代码，那么我会得到 proffer 结果。但是在显示中我只能显示文件名，因为我只存储了文件名。

所以我修改了代码并使用此代码存储了文件内容

        FileInputStream fis = null;
        if (file.isHidden() || file.isDirectory() || !file.canRead() || !file.exists()) {
            return;
        }
        if (suffix!=null && !file.getName().endsWith(suffix)) {
            return;
        }
        System.out.println("Indexing file " + file.getCanonicalPath());

        try {
          fis = new FileInputStream(file);
        } catch (FileNotFoundException fnfe) {
          System.out.println("File Not Found"+fnfe);

       }
      DataInputStream in = new DataInputStream(fis);
      BufferedReader br = new BufferedReader(new InputStreamReader(in));
      String strLine;   
      String Data="";
     while ((strLine = br.readLine()) != null) 
         {
            Data=Data+strLine;
         }

        Document doc = new Document();
        doc.add(new TextField("contents", Data, Field.Store.YES));
        doc.add(new StoredField("filename", file.getCanonicalPath()));

        if (indexWriter.getConfig().getOpenMode() == OpenMode.CREATE) {

           System.out.println("adding " + file);
          indexWriter.addDocument(doc);
       } else {

            System.out.println("updating " + file);
            indexWriter.updateDocument(new Term("path", file.getPath()), doc);
          }

根据我的理解，我应该得到结果数为 1。它应该显示包含 maclean 的文件的文件名和内容

但相反，我得到的结果为

- - - - - - - - - - - -结果 - - - - - - - - - - - - -

共 0 个匹配文件找到 0

我在代码中做错了什么还是对此有合理的解释？为什么第一个代码有效而第二个无效？

搜索查询代码

 try
   {
    Directory directory = FSDirectory.open(indexDir);
    IndexReader reader = DirectoryReader.open(directory);
    IndexSearcher searcher = new IndexSearcher(reader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_41);

    QueryParser parser = new QueryParser(Version.LUCENE_41, "contents", analyzer);
    Query query = parser.parse(queryStr);
    System.out.println("Searching for: " + query.toString("contents"));
    TopDocs results = searcher.search(query, maxHits);

    ScoreDoc[] hits = results.scoreDocs;
    int numTotalHits = results.totalHits;

    System.out.println("\n\n\n-----------------------Results--------------------------\n\n\n");
   System.out.println(numTotalHits + " total matching documents");


    for (int i = 0; i < numTotalHits; i++) {
        int docId = hits[i].doc;
        Document d = searcher.doc(docId);

                   System.out.println(i+":File name is: "+d.get("filename"));
                   System.out.println(i+":File content is: "+d.get("contents"));



    }
    System.out.println("Found " + numTotalHits);
   }
   catch(Exception e)
   {
    System.out.println("Exception Was caused in SimpleSearcher");
    e.printStackTrace();

   }

score 1 · Accepted Answer

我认为您的确切问题是，当您为索引字段创建 BufferedReader 时，您已经读取了整个文件，并且流位于文件的末尾，没有进一步的阅读内容。你应该可以通过调用来解决这个问题fis.reset();

但是，您不应该这样做。不要将相同的数据存储在两个单独的字段中，一个用于索引，一个用于存储。相反，设置相同的字段来存储和索引数据。 TextField 有一个 ctor，允许您存储数据和索引，例如：

doc.add(new TextField("contents", Data, Field.Store.YES));

score 1 · Accepted Answer

使用 StoredField 而不是 TextField

doc.add(new StoredField("Data",Line));

当您使用文本字段时，字符串会被标记化，因此您将无法搜索相同的内容。存储字段存储整个字符串而不对其进行标记。

score 0 · Accepted Answer

0

这适用于 Lucene 4.5： doc.add(new TextField("Data", Data, Field.Store.YES));

于 2013-10-16T20:35:03.387 回答

score 0 · Accepted Answer

我认为您的代码可能存在两个问题。

首先，我注意到您没有使用近实时搜索，也没有在阅读之前提交作者。Lucene 的 IndexReader 拍摄索引的快照，可以是未使用 NRT 时的提交版本，也可以是使用 NRT 时提交和未提交的版本。这可能是您的 IndexReader 无法看到更改的原因。由于您似乎需要并发读写，我建议您使用 NRT 搜索 ( IndexReader reader = DirectoryReader.open(indexWriter);)

第二个问题可能是，正如@femtoRgon 所说，您存储的数据可能不是您所期望的。我注意到，当您附加文件内容进行存储时，您似乎丢失了 EOL 字符。我建议你使用 Luke 来检查你的索引http://www.getopt.org/luke/

java - Lucene Search 保存文件内容时不返回结果

4 回答 4

Related

Reference