groovy - 泰坦中的“超级节点”

Question

我正在开发一个可以很好地与图形数据库（Titan）配合使用的应用程序，除了它在具有许多边的顶点（即supernodes ）方面存在问题。

上面的超级节点链接指向 Titan 作者的博客文章，解释了解决问题的方法。解决方案似乎是通过对边缘进行过滤来减少顶点的数量。

不幸的是，我想要groupCount边缘或顶点的属性。例如，我有 100 万用户，每个用户都属于一个国家。我怎样才能快速groupCount计算出每个国家的用户数量？

到目前为止我所尝试的可以在这个精心制作的 groovy 脚本中显示：

g = TitanFactory.open('titan.properties')  // Cassandra
r = new Random(100)
people = 1e6

def newKey(g, name, type) {
    return g
        .makeType()
        .name(name)
        .simple()
        .functional()
        .indexed()
        .dataType(type)
        .makePropertyKey()
}

def newLabel(g, name, key) {
    return g
        .makeType()
        .name(name)
        .primaryKey(key)
        .makeEdgeLabel()
}

country = newKey(g, 'country', String.class)
newLabel(g, 'lives', country)

g.stopTransaction(SUCCESS)

root = g.addVertex()
countries = ['AU', 'US', 'CN', 'NZ', 'UK', 'PL', 'RU', 'NL', 'FR', 'SP', 'IT']

(1..people).each {
    country = countries[(r.nextFloat() * countries.size()).toInteger()]
    g.startTransaction()
    person = g.addVertex([name: 'John the #' + it])
    g.addEdge(g.getVertex(root.id), person, 'lives', [country: country])
    g.stopTransaction(SUCCESS)
}

t0 = new Date().time

m = [:]    
root = g.getVertex(root.id)
root.outE('lives').country.groupCount(m).iterate()

t1 = new Date().time

println "groupCount seconds: " + ((t1 - t0) / 1000)

基本上一个根节点（为了 Titan 没有“所有”节点查找），链接到许多person具有country属性的通过边。当我在 100 万个顶点上运行 groupCount() 时，需要一分钟。

我意识到 Titan 可能正在遍历每条边并收集计数，但是有没有办法让它在 Titan 或任何其他图形数据库中运行得更快？可以对索引本身进行计数，因此不必遍历吗？我的索引是否正确？

score 8 · Accepted Answer

如果您将“国家”作为“生活”标签的主键，则可以更快地检索特定国家/地区的所有人。但是，在您的情况下，您对组计数感兴趣，该组计数需要检索该根节点的所有边缘，以便对其进行迭代并对国家/地区进行存储。

因此，这个分析查询更适合图形分析框架Faunus。它不需要根顶点，因为它通过完整的数据库扫描来执行 groupcount，从而避免了超级节点问题。Faunus 还使用 Gremlin 作为查询语言，因此您只需稍微修改查询：

g.V.country.groupCount.cap...

HTH，马蒂亚斯

groovy - 泰坦中的“超级节点”

1 回答 1

Related

Reference