jena - 为现有的 Apache Jena TDB 创建 Lucene 索引以实现文本搜索

Question

我有一个大型 Apache Jena TDB，我想使用 Apache Jena 2.10.2 构建一个 Lucene 索引，以用于新的文本搜索功能。我发现文档很难遵循。

我首先尝试在代码中使用配置，但遇到了依赖问题。lecene-core 和 solr-solrj 的任何组合都会导致某些“classNotFound”错误或“StandardAnalyzer overrides final method tokenStream”错误。代码示例：

Dataset ds1 = DatasetFactory.createMem() ;

EntityDefinition entDef = new EntityDefinition("uri", "text", RDFS.label) ;

Directory dir =  new RAMDirectory();

// Have also tried creating the index in a file
File indexDir = new File("luceneIndexes");
Directory dir = FSDirectory.open(indexDir);

// Fails on this line
Dataset ds = TextDatasetFactory.createLucene(ds1, dir, entDef) ;

我认为唯一的解决方案可能是创建一个文本数据集汇编器，但如果有人对在代码中创建它有建议，我更愿意这样做。

score 1 · Accepted Answer

该示例正是来自 Jena 的示例，它确实有效。

看起来你对 jar 版本有混淆。您是否尝试过使用 maven 来解决依赖关系？查看“mvn dependency:tree”会显示使用的版本。

jena-text 是为 Lucene 4.3.1 或 Solr 4.3.1 构建的。

从以下位置查看 POM： https ://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-text/1.0.0-SNAPSHOT/

jena - 为现有的 Apache Jena TDB 创建 Lucene 索引以实现文本搜索

1 回答 1

Related