We are running a Lucene query for the date range 20000101 to 20070531, but Lucene only returns documents with a publicationDate between 20000101-20000701 and 20070101-20070531. Lucene skips several years. When running different date sets the results are similar.
Full insert code:
Document doc = new Document();
doc.add(new Field("pageNumber", article.getPageNumber(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new NumericField("publicationDate", 8, Field.Store.YES, true).setIntValue(Integer.parseInt(article.getPublicationDate())));
doc.add(new Field("headline", article.getHeadline(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("text", article.getText(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("fileName", article.getFileName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaType", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("mediaSource", article.getMediaSource(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("overLap", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("status", article.getMediaType(), Field.Store.YES, Field.Index.NOT_ANALYZED));
indexWriter.addDocument(doc);
Document count code:
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
IndexReader reader = IndexReader.open(index);
Query sourceQuery = new TermQuery(new Term("mediaSource", source));
QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
Query textQuery = queryParser.parse(terms);
Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);
IndexSearcher searcher = new IndexSearcher(reader);
TotalHitCountCollector collector = new TotalHitCountCollector();
searcher.search(booleanQuery, collector);
System.out.println("start: " + startDate);
System.out.println("end: " + endDate);
System.out.println("total: " + collector.getTotalHits());
String hitCount = String.valueOf(collector.getTotalHits());
searcher.close();
reader.close();
analyzer.close();
return hitCount;
Full document list:
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
Directory index = new SimpleFSDirectory(new File(LUCENE_INDEX_DIRECTORY));
IndexReader reader = IndexReader.open(index);
Query sourceQuery = new TermQuery(new Term("mediaSource", source));
QueryParser queryParser = new QueryParser(Version.LUCENE_36, "text", analyzer);
Query textQuery = queryParser.parse(terms);
Query dateRangeQuery = NumericRangeQuery.newIntRange("publicationDate", startDate, endDate, true, true);
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(sourceQuery, BooleanClause.Occur.MUST);
booleanQuery.add(textQuery, BooleanClause.Occur.MUST);
booleanQuery.add(dateRangeQuery, BooleanClause.Occur.MUST);
IndexSearcher searcher = new IndexSearcher(reader);
TotalHitCountCollector collector = new TotalHitCountCollector();
searcher.search(booleanQuery, collector);
Sort sort = new Sort(new SortField("publicationDate", SortField.INT));
if (collector.getTotalHits() > 0) {
TopDocs topDocs = searcher.search(booleanQuery, collector.getTotalHits(), sort);
int i = 0;
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
ArrayList<String> resultRow = new ArrayList<String>();
Document doc = searcher.doc(scoreDoc.doc);
resultRow.add(String.valueOf(i));
resultRow.add(doc.get("publicationDate"));
resultRow.add(doc.get("mediaSource"));
resultRow.add(doc.get("fileName"));
resultRow.add(doc.get("headline"));
resultRow.add(doc.get("pageNumber"));
ql.results.put(String.valueOf(i), resultRow);
i++;
}
} else {
ArrayList<String> resultRow = new ArrayList<String>();
resultRow.add("0");
resultRow.add("0");
resultRow.add("0");
resultRow.add("0");
resultRow.add("0");
resultRow.add("0");
ql.results.put("0", resultRow);
}
Truncated results (last 10 of 2058 documents):
20021231 Iraq Belongs on the Back Burner 20021231 With Missionaries Spreading, Muslims' Anger Is Following 20021231 WHITE HOUSE CUTS ESTIMATE OF COST OF WAR WITH IRAQ 20021231 Bring Back the Draft 20040101 Pakistani Leader's New Tactic: Persuasion 20040101 What We Will Do in 2004 20040101 Ethnic Morass Bogs Down Afghan Talks On Charter 20040101 U.S. Hunts Terror Clues in Case of 2 Brothers 20040101 Giving Up Those Weapons: After Libya, Who Is Next? 20040101 An Odd Sight in Iran as Aid Team Tents Go Up: The U.S. Flag