graph-databases - Gremlin Big Pi 运算（一系列术语的乘积），然后是基于 id 的求和

Question

我有一个看起来像这样的数据集：

V1 = name:"some name1"
V2 = name:"some name2"
V3 = name:"some name3"
V4 = name:"some name4"

E1 = weight:0.2, StartVertex:V1, EndVertex:V2
E2 = weight:0.3, StartVertex:V1, EndVertex:V3
E3 = weight:0.4, StartVertex:V1, EndVertex:V4
E4 = weight:0.5, StartVertex:V2, EndVertex:V1
E5 = weight:0.6, StartVertex:V2, EndVertex:V3
...

我有一个 gremlin 查询，可以找到这些顶点之间的一些路径。

我想在那里做两件事。

1：我希望能够找到路径中所有权重的乘积（path_edge1.weight * path_edge2.weight * ...）

2：我希望能够根据末端顶点对每条路径的结果求和。

我想要实现的伪代码：

g.V().has('name',REGEX,\".+some_query.+\").inE.outV.inE.FindingAPathSomehow.path{path_score = 1 : foreach edge e: path_score = path_score * e.weight}{it.lastV().id}.sumWhereIdIsEqual(it[1])

希望这有点可以理解。

因为我使用的是 RexPro，所以我希望能够在纯 gremlin/groovy 脚本中完成所有操作。

我已经四处寻找答案，但还没有找到一种方法来做到这一点。

如果上述内容不清楚，请进一步解释：

查询时，我正在寻找子字符串等于“some_query”的顶点。这会给我一组起始顶点。

有了这些顶点，我正在我的图中寻找一个特定的路径，它将给我几个可能看起来像这样的路径：

V = Vertex
E = Edge

Path1 = V3 - E2 - V1
Path2 = V4 - E5 - V7 - E1 - V1

这些边中的每一个都具有权重属性。有了这个，我想得到所谓的“Big Pi”或“Capital Pi”，它是一个序列的产物。考虑求和（Σ），但用乘法而不是加法。

结果Path1将是 E2 的权重，或者0.3在上面的示例中。而在上面的例子中，它Path2的权重是.E5.weight * E1.weight0.6 * 0.2 = 0.12

在这种情况下，我们从顶点V3和开始V4，并且都在结束V1。在这种情况下，我想对和的权重求和Path1，Path2因为两个末端顶点都是V1。这将给出总分V1as 0.3 + 0.12 = 0.42。如果有一个Path3with end VertexV2和 score 0.34，那么结果列表将必须包含其中的元素；{[V1, 0.42], [V2,0.34] }.

score 1 · Accepted Answer

You can do something like this:

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.v(1).as('s').outE.inV.loop('s'){it.loops<3}{true}.path
==>[v[1], e[7][1-knows->2], v[2]]
==>[v[1], e[8][1-knows->4], v[4]]
==>[v[1], e[9][1-created->3], v[3]]
==>[v[1], e[8][1-knows->4], v[4], e[10][4-created->5], v[5]]
==>[v[1], e[8][1-knows->4], v[4], e[11][4-created->3], v[3]]

The above uses the toy graph to get some paths that produces multiple results with the same endpoint. Since you to multiply the edge weights for each path and then sum them for each vertex ending a path, it would seem that a good return value for all this would be Map keyed on the end vertex with a value being the list of lists of weights for each path. To do that, I used a groupBy:

gremlin> m=g.v(1).as('s').outE.inV.loop('s'){it.loops<3}{true}.path.groupBy{it[it.size()-1]}{it.findAll{it instanceof Edge}.collect{it.getProperty("weight")}}.cap.next()
==>v[3]=[[0.4], [1.0, 0.4]]
==>v[2]=[[0.5]]
==>v[5]=[[1.0, 1.0]]
==>v[4]=[[1.0]]

The first closure to the groupBy provides the key (i.e. the last vertex in the path). The second closure filters the Edge objects and pulls off the weight to store in the list of paths for each key. From here you can operate with the m or Map to finish the calculation. At this point we're basically just doing straight Groovy. The following shows the calculation of the product of the weights:

gremlin> m.collect{k,v->[(k):v.collect{p->p.inject{product,weight->product*weight}}]}           
==>{v[3]=[0.4, 0.4000000059604645]}
==>{v[2]=[0.5]}
==>{v[5]=[1.0]}
==>{v[4]=[1.0]}

Once you have that much, calculating the sum per end vertex is just done with the groovy sum function:

gremlin> m.collect{k,v->[(k):v.collect{p->p.inject{product,weight->product*weight}}.sum()]}
==>{v[3]=0.800000011920929}
==>{v[2]=0.5}
==>{v[5]=1.0}
==>{v[4]=1.0}

Note that I'm breaking this up into multiple Gremlin statements for ease of explanation and readability, but if you like the single line style you could go that way too. The best way to get it back to single line would be to add a third closure to the groupBy which will act as a reduce step to calculate the weight product/sum.

graph-databases - Gremlin Big Pi 运算（一系列术语的乘积），然后是基于 id 的求和

1 回答 1

Related

Reference