neo4j - 如何为 Neo4j 图形构建自定义 Lucene 索引？

Question

我正在使用 Gremlin 和 Neo4j 来操作来自 infochimps 的ENRON 数据集。这个数据集有两种类型的顶点，Message和Email Addresss两种类型的边，SENT和RECEVIED_BY。我想在这个数据集上创建一个自定义索引，Lucene为每个顶点创建一个文档，type: 'Message'并将来自相关顶点（例如，）的信息合并v.in()为文档v.out()中的附加字段Lucene。

我正在考虑以下代码

g = new Neo4jGraph('enron');

PerFieldAnalyzerWrapper analyzer =
    new PerFieldAnalyzerWrapper(new StandardAnalyzer());
analyzer.addAnalyzer("sender", new KeywordAnalyzer());
analyzer.addAnalyzer("recipient", new KeywordAnalyzer());

IndexWriter idx = new IndexWriter (dir,analyzer,IndexWriter.MaxFieldLength.UNLIMITED);

g.V.filter{it.type == 'Message'}.each { v ->
    Document doc = new Document();
    doc.add(new Field("subject", v.subject));
    doc.add(new Field("body", v.body));
    doc.add(new Field("sender", v.in().address);
    v.out().each { recipient -> 
        doc.add(new Field("recipient", recipient.address));
    }
    idx.addDocument(doc);
}
idx.close();

我的问题是：

有没有更好的方法来枚举用于索引的顶点？
我可以为此使用自动索引吗？如果可以，我如何指定应该索引的内容？
我可以指定我自己的Analyzer，还是我坚持默认？什么是默认值？
如果我必须创建自己的索引，我应该为此使用 gremlin，还是使用 Java 程序更好？

score 0 · Accepted Answer

我刚刚使用 Java 进程完成了导入，这真的很容易，在我看来，通过 Gremlin 更好地包容。

无论如何，如果进程失败是因为您无法创建 StandardAnalyzer 的新对象。该类的所有构造函数都需要参数，因此您应该创建一个包装类或使用正确版本的 Lucene 创建它，例如构造函数中的参数。

直到今天，Neo4J 只接受到 lucene 版本 36。

score 0 · Accepted Answer

我将在这里谈论直接 Neo4j 访问，因为我在 Gremlin 的旅行并不好。

所以你想在图形本身“之外”构建一个 Lucene 索引？否则，您可以使用内置的 graphDb.index().forNodes("myIndex", configForMyIndex) 来获取（按需创建）与 neo4j 关联的 Lucene 索引。然后，您可以通过调用 index.add(node, key, value) 向每个文档添加多个字段，其中每个节点将由该 Lucene 索引中的一个文档表示。

1) 在 Gremiln... 我不知道

2）见http://docs.neo4j.org/chunked/milestone/auto-indexing.html

3）见http://docs.neo4j.org/chunked/milestone/indexing-create-advanced.html

4）您是否需要完全在数据库之外创建它？如果是这样，为什么？

neo4j - 如何为 Neo4j 图形构建自定义 Lucene 索引？

2 回答 2

Related

Reference