java - 大数据集的 Spring 数据/Neo4j 路径长度

Question

我一直在运行以下查询以查找给定人一定“距离”内的亲戚：

@Query("start person=node({0}), relatives=node:__types__(className='Person') match p=person-[:PARTNER|CHILD*]-relatives where LENGTH(p) <= 2*{1} return distinct relatives")
Set<Person> getRelatives(Person person, int distance);

2*{1} 来自被表示为两个节点的人之间的一个概念“跳跃” - 一个人和一个伙伴关系。

到目前为止，在测试人群中这一切都很好。现在我将继续处理实际数据，其中包含 1 到 1000 万的大小，而且这是永远的（也来自 Web 界面中的数据浏览器）。

假设成本是将所有内容加载到ancestors中，我将查询重写为数据浏览器中的测试：

start person=node(385716) match p=person-[:PARTNER|CHILD*1..10]-relatives where relatives.__type__! = 'Person' return distinct relatives

这在同一数据存储上运行良好，只需几分之一秒。但是当我想把它放回Java时：

@Query("start person=node({0}) match p=person-[:PARTNER|CHILD*1..{1}]-relatives where relatives.__type__! = 'Person' return relatives")
Set<Person> getRelatives(Person person, int distance);

那是行不通的：

[...]
Nested exception is Properties on pattern elements are not allowed in MATCH.
"start person=node({0}) match p=person-[:PARTNER|CHILD*1..{1}]-relatives where relatives.__type__! = 'Neo4jPerson' return relatives"
                                         ^

有没有更好的方法来限制路径长度？我宁愿不使用 awhere因为这将涉及加载所有路径，可能会加载数百万个节点，我只需要进入 10 的深度。这大概不会让我过得更好。

任何想法将不胜感激！

迈克尔来救援！

我的解决方案：

public Set<Person> getRelatives(final Person person, final int distance) {

    final String query = "start person=node(" + person.getId() + ") "
        + "match p=person-[:PARTNER|CHILD*1.." + 2 * distance + "]-relatives "
        + "where relatives.__type__! = '" + Person.class.getSimpleName() + "' "
        + "return distinct relatives";

    return this.query(query);

    // Where I would previously instead have called 
    // return personRepository.getRelatives(person, distance);
}

public Set<Person> query(final String q) {

    final EndResult<Person> result = this.template.query(q, MapUtil.map()).to(Neo4jPerson.class);
    final Set<Person> people = new HashSet<Person>();

    for (final Person p : result) {
        people.add(p);
    }

    return people;
}

运行速度非常快！

score 1 · Accepted Answer

您快到了：）

您的第一个查询是全图扫描，它有效地将整个数据库加载到内存中，并通过此模式匹配多次拉取所有节点。

所以它不会很快，而且它会返回巨大的数据集，不知道这是否是你想要的。

第二个查询看起来不错，唯一的问题是您不能参数化可变长度关系的最小值-最大值。由于查询优化/缓存的影响。

所以现在你必须在你的仓库中使用 template.query() 或不同的查询方法来获取不同的最大值。

java - 大数据集的 Spring 数据/Neo4j 路径长度

1 回答 1

Related

Reference