我正在使用 Lucene 4.1 来索引关键字/值对,其中关键字和值不是真实的单词 - 即它们是电压、设置,不应分析或标记化。例如 $P14R / 16777216。(这是任何流式细胞仪的 FCS 数据)
对于索引,我使用 indexed = true、stored = true 和tokenized = false创建了一个 FieldType 。这些模仿了来自 Lucene 1 的古老 Field.Keyword,我有这本书。:-) 我什至冻结了 fieldType。
我在调试器中看到了这些值。我创建文档和索引。
当我阅读索引和文档并查看调试器中的字段时,我看到了我的所有字段。名称和字段数据看起来正确。但是,FieldType 是错误的。它显示 indexed = true、stored = true 和tokenized = true。结果是我的搜索(使用 TermQuery)不起作用。
我怎样才能解决这个问题?谢谢。
ps 我在 IndexWriterConfig 中使用 KeywordAnalyzer。稍后我会尝试发布一些演示代码,但这与我今天的实际工作无关。:-)
演示代码:
public class LuceneDemo {
public static void main(String[] args) throws IOException {
Directory lDir = new RAMDirectory();
Analyzer analyzer = new KeywordAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_41, analyzer);
iwc.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(lDir, iwc);
// BTW, Lucene, anyway you could make this even more tedious???
// ever heard of builders, Enums, or even old fashioned bits?
FieldType keywordFieldType = new FieldType();
keywordFieldType.setStored(true);
keywordFieldType.setIndexed(true);
keywordFieldType.setTokenized(false);
Document doc = new Document();
doc.add(new Field("$foo", "$bar123", keywordFieldType));
doc.add(new Field("contents", "$foo=$bar123", keywordFieldType));
doc.add(new Field("$foo2", "$bar12345", keywordFieldType));
Field onCreation = new Field("contents", "$foo2=$bar12345", keywordFieldType);
doc.add(onCreation);
System.out.println("When creating, the field's tokenized is " + onCreation.fieldType().tokenized());
writer.addDocument(doc);
writer.close();
IndexReader reader = DirectoryReader.open(lDir);
Document d1 = reader.document(0);
Field readBackField = (Field) d1.getFields().get(0);
System.out.println("When read back the field's tokenized is " + readBackField.fieldType().tokenized());
IndexSearcher searcher = new IndexSearcher(reader);
// exact match works
Term term = new Term("$foo", "$bar123" );
Query query = new TermQuery(term);
TopDocs results = searcher.search(query, 10);
System.out.println("when searching for : " + query.toString() + " hits = " + results.totalHits);
// partial match fails
term = new Term("$foo", "123" );
query = new TermQuery(term);
results = searcher.search(query, 10);
System.out.println("when searching for : " + query.toString() + " hits = " + results.totalHits);
// wildcard search works
term = new Term("contents", "*$bar12345" );
query = new WildcardQuery(term);
results = searcher.search(query, 10);
System.out.println("when searching for : " + query.toString() + " hits = " + results.totalHits);
}
}
输出将是:
When creating, the field's tokenized is false
When read back the field's tokenized is true
when searching for : $foo:$bar123 hits = 1
when searching for : $foo:123 hits = 0
when searching for : contents:*$bar12345 hits = 1