I have pageranks result from ParallelPersonalizedPageRank in Graphframes, which is a DataFrame with each element as sparseVector as following:
+---------------------------------------+
| pageranks |
+---------------------------------------+
|(1887,[0, 1, 2,...][0.1, 0.2, 0.3, ...]|
|(1887,[0, 1, 2,...][0.2, 0.3, 0.4, ...]|
|(1887,[0, 1, 2,...][0.3, 0.4, 0.5, ...]|
|(1887,[0, 1, 2,...][0.4, 0.5, 0.6, ...]|
|(1887,[0, 1, 2,...][0.5, 0.6, 0.7, ...]|
What is the best way to add all the element of the sparseVector and generatre a sum or average? I suppose we can converter each sparseVector to denseVector with toArray and traverse each array to get the result with two nested loop, and get some thing like this:
+-----------+
|pageranks |
+-----------+
|avg1|
|avg2|
|avg3|
|avg4|
|avg5|
|... |
I am sure there should be better way, but I could not find much on the API docs about sparseVector operation. Thanks!