py2neo - 在 py2neo 中水合大量实体的正确方法

Question

这更像是一个最佳实践问题。我正在为高度结构化的数据实现搜索后端，这些数据本质上由本体、术语和它们之间的一组复杂映射组成。Neo4j 看起来很自然，经过一些原型设计后，我决定使用 py2neo 作为与 neo4j 通信的一种方式，主要是因为对批处理操作的良好支持。这更像是一个最佳实践问题。

我感到沮丧的是，我无法在代码中引入我想在代码中引入的高级抽象类型——我被困在要么直接将对象用作迷你 ORM，但随后我正在进行大量的原子休息调用，这会降低性能（我有一个相当大的数据集）。

我一直在做的是获取我的查询结果，在它们上使用 get_properties 来批处理我的对象，这很好，这就是我首先走这条路的原因，但这让我传递了 (node, properties) 在我的代码中，它完成了工作，但并不漂亮。一点也不。

所以我想我要问的是是否有一个最佳实践可以在 py2neo 中使用相当丰富的对象图，稍后在保持性能的同时获得类似 ORM 的细节（在我的情况下，这意味着尽可能多地做批量查询）

score 4 · Accepted Answer

我不确定我是否明白你想要什么，但我有一个类似的问题。我想进行大量调用并创建大量节点、索引和关系……（大约 120 万）。下面是一个使用py2neo批量添加节点、关系、索引和标签的例子

from py2neo import neo4j, node, rel
gdb = neo4j.GraphDatabaseService("<url_of_db>")
batch = neo4j.WriteBatch(gdb)

a = batch.create(node(name='Alice'))
b = batch.create(node(name='Bob'))

batch.set_labels(a,"Female")
batch.set_labels(b,"Male")

batch.add_indexed_node("Name","first_name","alice",a) #this will create an index 'Name' if it does not exist
batch.add_indexed_node("Name","first_name","bob",b) 

batch.create(rel(a,"KNOWS",b)) #adding a relationship in batch

batch.submit() #this will now listen to the db and submit the batch records. Ideally around 2k-5k records should be sent

score 2 · Accepted Answer

由于您要求最佳实践，这是我遇到的一个问题：

当使用 py2neo 批量添加大量节点（~1M）时，当 neo4j 服务器内存不足时，我的程序通常会变慢或崩溃。作为一种解决方法，我将提交拆分为多个批次：

from py2neo import neo4j

def chunker(seq, size):
    """
    Chunker gets a list and returns slices 
    of the input list with the given size.
    """
    for pos in xrange(0, len(seq), size):
        yield seq[pos:pos + size]


def submit(graph_db, list_of_elements, size):
    """
    Batch submit lots of nodes.
    """

    # chunk data
    for chunk in chunker(list_of_elements, size):

        batch = neo4j.WriteBatch(graph_db)

        for element in chunk:
            n = batch.create(element)
            batch.add_labels(n, 'Label')

        # submit batch for chunk
        batch.submit()
        batch.clear()

我尝试了不同的块大小。对我来说，每批约 1000 个节点是最快的。但我想这取决于你的 neo4j 服务器的 RAM/CPU。

py2neo - 在 py2neo 中水合大量实体的正确方法

2 回答 2

Related

Reference