java - Lucene skips years when NumericRangeQuery on dates

Question

We are running a Lucene query for the date range 20000101 to 20070531, but Lucene only returns documents with a publicationDate between 20000101-20000701 and 20070101-20070531. Lucene skips several years. When running different date sets the results are similar.

Full insert code:

Document doc = new Document();
doc.add(new Field("pageNumber", article.getPageNumber(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new NumericField("publicationDate", 8, Field.Store.YES, true).setIntValue(Integer.parseInt(article.getPublicationDate())));
doc.add(new Field("headline", article.getHeadline(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("text", article.getText(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("fileName", article.getFileName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaType", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaSource", article.getMediaSource(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("overLap", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("status", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
indexWriter.addDocument(doc);

Document count code:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
    IndexReader reader = IndexReader.open(index);

    Query sourceQuery = new TermQuery(new Term("mediaSource", source));
    QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
    Query textQuery = queryParser.parse(terms);
    Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);

    BooleanQuery booleanQuery = new BooleanQuery();
    booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);

    IndexSearcher searcher = new IndexSearcher(reader);

    TotalHitCountCollector collector = new TotalHitCountCollector();
    searcher.search(booleanQuery, collector);

    System.out.println("start: " + startDate);
    System.out.println("end: " + endDate);
    System.out.println("total: " + collector.getTotalHits());

    String hitCount = String.valueOf(collector.getTotalHits());
    searcher.close();
    reader.close();
    analyzer.close();
    return hitCount;

Full document list:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
    IndexReader reader = IndexReader.open(index);

    Query sourceQuery = new TermQuery(new Term("mediaSource", source));
    QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
    Query textQuery = queryParser.parse(terms);
    Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);

    BooleanQuery booleanQuery = new BooleanQuery();
    booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
    booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);

    IndexSearcher searcher = new IndexSearcher(reader);
    TotalHitCountCollector collector = new TotalHitCountCollector();
    searcher.search(booleanQuery, collector);

    Sort sort = new Sort(new SortField("publicationDate", SortField.INT));

    if (collector.getTotalHits() > 0) {
        TopDocs topDocs = searcher.search(booleanQuery, collector.getTotalHits(), sort);

        int i = 0;
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            ArrayList<String> resultRow = new ArrayList<String>();
            Document doc = searcher.doc(scoreDoc.doc);
            resultRow.add(String.valueOf(i));
            resultRow.add(doc.get("publicationDate"));
            resultRow.add(doc.get("mediaSource"));
            resultRow.add(doc.get("fileName"));
            resultRow.add(doc.get("headline"));
            resultRow.add(doc.get("pageNumber"));
            ql.results.put(String.valueOf(i), resultRow);
            i++;
        }
    } else {
        ArrayList<String> resultRow = new ArrayList<String>();
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        resultRow.add("0");
        ql.results.put("0", resultRow);
    }

Truncated results (last 10 of 2058 documents):

20021231   Iraq Belongs on the Back Burner
20021231    With Missionaries Spreading, Muslims' Anger Is Following
20021231    WHITE HOUSE CUTS ESTIMATE OF COST OF WAR WITH IRAQ
20021231    Bring Back the Draft
20040101    Pakistani Leader's New Tactic: Persuasion
20040101    What We Will Do in 2004
20040101    Ethnic Morass Bogs Down Afghan Talks On Charter
20040101    U.S. Hunts Terror Clues in Case of 2 Brothers
20040101    Giving Up Those Weapons: After Libya, Who Is Next?
20040101    An Odd Sight in Iran as Aid Team Tents Go Up: The U.S. Flag

score -1 · Accepted Answer

问题是 NumericRangeQueries 无法正常工作。使用带有字符串值的 RangeQuery 可以解决问题。

java - Lucene skips years when NumericRangeQuery on dates

1 回答 1

Related

Reference