1

我试图使用connectedComponents()pyspark 中的 graphframes 来计算一个相当大的图的连通分量,该图大约有 1800K 顶点和 500k 边。

edgeDF.printSchema()
root
 |-- src: string (nullable = true)
 |-- dst: string (nullable = true)


vertDF.printSchema()
root
 |-- id: string (nullable = true)

vertDF.count()
1879806

edgeDF.count()
452196

custGraph = gf.GraphFrame(vertDF,edgeDF)

comp = custGraph.connectedComponents()

6小时后任务还没有结束。我在一台带有 Windows 的机器上运行 pyspark

一个。在给定的设置中进行这样的计算是否可行?

湾。我收到如下警告消息

[rdd_73_2, rdd_90_2]
[Stage 21:=========>        (2 + 2) / 4][Stage 22:>                 (0 + 2) / 4]16/10/13 01:28:42 WARN Executor: 2 block locks were not released by TID = 632:

[rdd_73_0, rdd_90_0]
[Stage 21:=============>    (3 + 1) / 4][Stage 22:>                 (0 + 3) / 4]16/10/13 01:28:43 WARN Executor: 2 block locks were not released by TID = 633:

[rdd_73_1, rdd_90_1]
[Stage 37:>                 (0 + 4) / 4][Stage 38:>                 (0 + 0) / 4]16/10/13 01:28:47 WARN Executor: 3 block locks were not released by TID = 844:

[rdd_90_0, rdd_104_0, rdd_107_0]

这是什么意思?

C。我们如何指定图在图框中是无向的?我们需要在两个方向上添加边吗?

4

0 回答 0