我使用 Michael Hunger 的 Batch Import 导入了数据,通过它我创建了:-
4,612,893 nodes
14,495,063 properties
node properties are indexed.
5,300,237 relationships
{问题} Cypher 查询执行速度太慢,几乎是爬行,简单的遍历需要 5 分钟以上才能返回结果集,请让我知道如何调整服务器以获得更好的性能以及我做错了什么。
店铺详情:-
-rw-r--r-- 1 root root 567M Jul 12 12:42 data/graph.db/neostore.propertystore.db
-rw-r--r-- 1 root root 167M Jul 12 12:42 data/graph.db/neostore.relationshipstore.db
-rw-r--r-- 1 root root 40M Jul 12 12:42 data/graph.db/neostore.nodestore.db
-rw-r--r-- 1 root root 7.8M Jul 12 12:42 data/graph.db/neostore.propertystore.db.strings
-rw-r--r-- 1 root root 330 Jul 12 12:42 data/graph.db/neostore.propertystore.db.index.keys
-rw-r--r-- 1 root root 292 Jul 12 12:42 data/graph.db/neostore.relationshiptypestore.db.names
-rw-r--r-- 1 root root 153 Jul 12 12:42 data/graph.db/neostore.propertystore.db.arrays
-rw-r--r-- 1 root root 88 Jul 12 12:42 data/graph.db/neostore.propertystore.db.index
-rw-r--r-- 1 root root 69 Jul 12 12:42 data/graph.db/neostore
-rw-r--r-- 1 root root 58 Jul 12 12:42 data/graph.db/neostore.relationshiptypestore.db
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.nodestore.db.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.arrays.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.index.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.index.keys.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.strings.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.relationshipstore.db.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.relationshiptypestore.db.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.relationshiptypestore.db.names.id
我在用
neo4j-community-1.9.1
java version "1.7.0_25"
Amazon EC2 m1.large instance with Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-40-virtual x86_64)
RAM ~8GB.
EBS 200 GB, neo4j is running on EBS volume.
调用为 ./neo4j-community-1.9.1/bin/neo4j start
以下是 neo4j 服务器信息:
neostore.nodestore.db.mapped_memory 161M
neostore.relationshipstore.db.mapped_memory 714M
neostore.propertystore.db.mapped_memory 90M
neostore.propertystore.db.index.keys.mapped_memory 1M
neostore.propertystore.db.strings.mapped_memory 130M
neostore.propertystore.db.arrays.mapped_memory 130M
mapped_memory_page_size 1M
all_stores_total_mapped_memory_size 500M
{数据模型} 就像社交图:-
User-User
User-[:FOLLOWS]->User
User-Item
User-[:CREATED]->Item
User-[:LIKE]->Item
User-[:COMMENT]->Item
User-[:VIEW]->Item
Cluster-User
User-[:FACEBOOK]->SocialLogin_Cluster
Cluster-Item
Item-[:KIND_OF]->Type_Cluster
Cluster-Cluster
Cluster-[:KIND_OF]->Type_Cluster
{一些查询}和时间:
START u=node(467242)
MATCH u-[r1:LIKE|COMMENT]->a<-[r2:LIKE|COMMENT]-lu-[r3:LIKE]-b
WHERE NOT(a=b)
RETURN u,COUNT(b)
查询耗时 1015348 毫秒。返回 70956115 结果计数。
START a=node:nodes(kind="user")
RETURN a,length(a-[:CREATED|LIKE|COMMENT|FOLLOWS]-()) AS cnt
ORDER BY cnt DESC
LIMIT 10
查询耗时 231613ms
根据建议,我将盒子升级为 M1.xlarge 和 M2.2xlarge
- M1.xlarge (vCPU:4,ECU:8,RAM:15 GB,实例存储:~600 GB)
- M2.2xlarge (vCPU:4,ECU:13,RAM:34 GB,实例存储:~800 GB)
我调整了如下属性,并从实例存储运行(针对 EBS)
neo4j.properties
neostore.nodestore.db.mapped_memory=1800M
neostore.relationshipstore.db.mapped_memory=1800M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=150M
neostore.propertystore.db.arrays.mapped_memory=10M
neo4j-wrapper.conf
wrapper.java.additional.1=-d64
wrapper.java.additional.1=-server
wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.initmemory=4098
wrapper.java.maxmemory=8192
但是查询(如下所示)仍然在几分钟内运行约 5-8 分钟,从推荐的角度来看这是不可接受的。
询问:
START u=node(467242)
MATCH u-[r1:LIKE]->a<-[r2:LIKE]-lu-[r3:LIKE]-b
RETURN u,COUNT(b)
{分析}
neo4j-sh (0)$ profile START u=node(467242) MATCH u-[r1:LIKE|COMMENT]->a<-[r2:LIKE|COMMENT]-lu-[r3:LIKE]-b RETURN u,COUNT(b);
==> +-------------------------+
==> | u | COUNT(b) |
==> +-------------------------+
==> | Node[467242] | 70960482 |
==> +-------------------------+
==> 1 row
==>
==> ColumnFilter(symKeys=["u", " INTERNAL_AGGREGATEad2ab10d-cfc3-48c2-bea9-be4b9c1b5595"], returnItemNames=["u", "COUNT(b)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=["u"], aggregates=["( INTERNAL_AGGREGATEad2ab10d-cfc3-48c2-bea9-be4b9c1b5595,Count)"], _rows=1, _db_hits=0)
==> TraversalMatcher(trail="(u)-[r1:LIKE|COMMENT WHERE true AND true]->(a)<-[r2:LIKE|COMMENT WHERE true AND true]-(lu)-[r3:LIKE WHERE true AND true]-(b)", _rows=70960482, _db_hits=71452891)
==> ParameterPipe(_rows=1, _db_hits=0)
neo4j-sh (0)$ profile START u=node(467242) MATCH u-[r1:LIKE|COMMENT]->a<-[r2:LIKE|COMMENT]-lu-[r3:LIKE]-b RETURN count(distinct a),COUNT(distinct b),COUNT(*);
==> +--------------------------------------------------+
==> | count(distinct a) | COUNT(distinct b) | COUNT(*) |
==> +--------------------------------------------------+
==> | 1950 | 91294 | 70960482 |
==> +--------------------------------------------------+
==> 1 row
==>
==> ColumnFilter(symKeys=[" INTERNAL_AGGREGATEe6b94644-0a55-43d9-8337-491ac0b29c8c", " INTERNAL_AGGREGATE1cfcd797-7585-4240-84ef-eff41a59af33", " INTERNAL_AGGREGATEea9176b2-1991-443c-bdd4-c63f4854d005"], returnItemNames=["count(distinct a)", "COUNT(distinct b)", "COUNT(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["( INTERNAL_AGGREGATEe6b94644-0a55-43d9-8337-491ac0b29c8c,Distinct)", "( INTERNAL_AGGREGATE1cfcd797-7585-4240-84ef-eff41a59af33,Distinct)", "( INTERNAL_AGGREGATEea9176b2-1991-443c-bdd4-c63f4854d005,CountStar)"], _rows=1, _db_hits=0)
==> TraversalMatcher(trail="(u)-[r1:LIKE|COMMENT WHERE true AND true]->(a)<-[r2:LIKE|COMMENT WHERE true AND true]-(lu)-[r3:LIKE WHERE true AND true]-(b)", _rows=70960482, _db_hits=71452891)
==> ParameterPipe(_rows=1, _db_hits=0)
请让我知道用于调整的配置和 neo4j 启动参数。提前致谢