2

图形框架有一个很好的有状态主题的例子。如何明确返回计数?如您所见,输出仅包含顶点和朋友,但不包含计数。

如何将其修改为不(仅)可以访问边缘但也可以访问顶点的标签?

when(relationship === "friend", cnt + 1).otherwise(cnt)

即我怎么能提高计数来计数

  • 每个顶点的年龄 > 30 的朋友
  • 朋友的百分比Greater30 / allFriends

    val g = examples.Graphs.friends  // get example graph
    
    // Find chains of 4 vertices.
    val chain4 = g.find("(a)-[ab]->(b); (b)-[bc]->(c); (c)-[cd]->(d)")
    
    // Query on sequence, with state (cnt)
    //  (a) Define method for updating state given the next element of the motif.
    def sumFriends(cnt: Column, relationship: Column): Column = {
      when(relationship === "friend", cnt + 1).otherwise(cnt)
    }
    //  (b) Use sequence operation to apply method to sequence of elements in motif.
    //      In this case, the elements are the 3 edges.
    val condition = Seq("ab", "bc", "cd").
      foldLeft(lit(0))((cnt, e) => sumFriends(cnt, col(e)("relationship")))
    //  (c) Apply filter to DataFrame.
    val chainWith2Friends2 = chain4.where(condition >= 2)
    

    http://graphframes.github.io/user-guide.html

    chainWith2Friends2.show()
    

哪个会输出

+-------------+------------+-------------+------------+-------------+------------+--------------+
|            a|          ab|            b|          bc|            c|          cd|             d|
+-------------+------------+-------------+------------+-------------+------------+--------------+
|[e,Esther,32]|[e,d,friend]| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,e,friend]| [e,Esther,32]|
|[e,Esther,32]|[e,d,friend]| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,b,friend]|    [b,Bob,36]|
| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,e,friend]|[e,Esther,32]|[e,d,friend]|  [d,David,29]|
| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,e,friend]|[e,Esther,32]|[e,f,follow]|  [f,Fanny,36]|
| [d,David,29]|[d,a,friend]| [a,Alice,34]|[a,b,friend]|   [b,Bob,36]|[b,c,follow]|[c,Charlie,30]|
| [a,Alice,34]|[a,e,friend]|[e,Esther,32]|[e,d,friend]| [d,David,29]|[d,a,friend]|  [a,Alice,34]|
+-------------+------------+-------------+------------+-------------+------------+--------------+
4

1 回答 1

1

请注意,它sumFriends返回一个列,列condition也是如此。这就是为什么您可以在where不带引号的语句中访问它的原因。因此,您所要做的就是将该列添加到您的数据框中。运行上述代码后,我可以运行

chain4.withColumn("condition",condition).select("condition").show

+---------+ 
|condition|
+---------+
 | 1|
 | 0|
 | 0|
 | 0|
 | 0|
 | 3|
 | 3|
 | 3|
 | 2|
 | 2|
 | 3|
 | 1|
+---------+

你也可以使用chain4.select(condition)

希望这可以帮助

于 2017-12-04T03:14:12.697 回答