我创建了一个测试数据集,类似于一些具有以下结构的客户端数据:
*1 100k *3..30
[:author]--------> person <---------[:member]
| | |
| *2..8 |
| [:topic] |
| | |
| *3..8 V *3..8 |
article-[:topic]-> topic <-[:topic]- project
1m 1k 20k
这些数字意味着例如我有 100 万篇文章,每篇文章都有一位作者,每篇文章有 3 到 8 个 rel 到 1000 个主题之一。
现在我有两个问题:
第一:这种布局有意义吗?主题成为超级节点 - 将它们作为节点上的属性会更好吗?
其次,我发现查询的巨大性能差异取决于我如何表达它。我们的想法是找到一起参与过一个项目的人,并且对某个主题有共同的兴趣:
查询1
match
n:person<-[:member]-p:project-[:member]->m,
n-[:topic]->t,
m-[:topic]->t
where
n.name='person1215' and
n<>m
return
m,t;
这在1500ms-9000ms
范围内返回。
query2要快得多:
match
n:person<-[:member]-p:project-[:member]->m,
n-[:topic]->t1,
m-[:topic]->t2
where
n.name='person1250' and
n<>m and
t1=t2
return
t1,m;
中返回200ms-400ms
。
第二:为什么 query2 这么快? 仅查看该查询会如何看待这一点?
query1的探查器输出:
ColumnFilter(symKeys=["n", "t", "m", " UNNAMED14", "p", " UNNAMED51", " UNNAMED35", " UNNAMED66"], returnItemNames=["m", "t"], _rows=3, _db_hits=0)
Filter(pred="((NOT(n == m) AND hasLabel(p: project)) AND hasLabel(p: project))", _rows=3, _db_hits=0)
PatternMatch(g="(p)-[' UNNAMED35']-(m),(p)-[' UNNAMED14']-(n),(m)-[' UNNAMED66']-(t),(n)-[' UNNAMED51']-(t)", _rows=3, _db_hits=0)
SchemaIndex(identifier="n", _db_hits=0, _rows=1, label="person", query="Literal(person1215)", property="name")
对于query2:
ColumnFilter(symKeys=["n", " UNNAMED67", "m", " UNNAMED14", "t2", "p", " UNNAMED51", "t1", " UNNAMED35"], returnItemNames=["t1", "m"], _rows=2, _db_hits=0)
Filter(pred="(((NOT(n == m) AND t1 == t2) AND hasLabel(p: project)) AND hasLabel(p: project))", _rows=2, _db_hits=0)
PatternMatch(g="(p)-[' UNNAMED35']-(m),(p)-[' UNNAMED14']-(n),(m)-[' UNNAMED67']-(t2),(n)-[' UNNAMED51']-(t1)", _rows=2, _db_hits=0)
SchemaIndex(identifier="n", _db_hits=0, _rows=1, label="person", query="Literal(person1250)", property="name")
非常感谢,
约尔格