问题标签 [graphframes]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

174 问题

0 投票

0 回答

1108 浏览

apache-spark - pyspark graphframes 查找大图的连接组件

我试图使用connectedComponents()pyspark 中的 graphframes 来计算一个相当大的图的连通分量，该图大约有 1800K 顶点和 500k 边。

6小时后任务还没有结束。我在一台带有 Windows 的机器上运行 pyspark

一个。在给定的设置中进行这样的计算是否可行？

湾。我收到如下警告消息

这是什么意思？

C。我们如何指定图在图框中是无向的？我们需要在两个方向上添加边吗？

2016-10-12T20:38:31.307

0 投票

1 回答

845 浏览

apache-spark - Using sc.parallelize inside map() or any other solution?

I have following issue: i need to find all combinations of values in the column B per each id from the column A and return the results as DataFrame

In example below of the input DataFrame

I need to get the following output DataFrame (it is for GraphX\GraphFrame)

The one solution that I thought till now it is:

output: [(1, [(20,15),(30,20),(30,15)]),(5,[(10,14)]),(3,[(50,33)])]

And here I'm stuck :( how to return it to the data frame that I need? One idea was to use parallelize:

For spark_sc I have other file with name spark_sc.py

but my code it failed:

if I use the spark_sc.sc() not into map() it works.

Any idea what I miss in the last step? is it possible at all to use parallelize()? or I need completely different solution? Thanks!

apache-spark pyspark apache-spark-sql graphframes

2016-10-19T01:20:25.930

0 投票

1 回答

130 浏览

apache-spark - GraphFrames 是否与类型化数据集兼容？

我们目前在工作中使用类型化数据集。我们目前正在探索使用 Graphframes。

但是，Graphframes 似乎基于 Dataset[Row] 的 Dataframe。Graphframes 是否与类型化的数据集兼容。例如数据集[人]

apache-spark graphframes

2016-11-10T03:01:12.337

0 投票

0 回答

538 浏览

scala - 如何编辑 GraphFrame 聚合消息中的列？

我对 GraphFrames 和 Scala 很陌生。我正在编写某种标签传播算法（与库算法非常不同）。本质上，每个顶点都有一个数组“memVector”，边有一个浮点值“floatWeights”。我想将每个顶点的 memVector 更新为所有邻居的 (floatWeights * memVector) 的总和。这是我为此编写的代码：

现在我写的 aggfunc 不正确，因为我不能直接将数组和浮点数相乘。我在 spark-shell 中运行上述内容，最后一行出现以下错误：

我接近它了吗？任何解决方法/解决方案将不胜感激。

scala apache-spark graphframes

2016-11-22T04:53:48.407

0 投票

4 回答

1344 浏览

pyspark - Dataproc：Jupyter pyspark notebook 无法导入 graphframes 包

在 Dataproc spark 集群中，graphframe 包在 spark-shell 中可用，但在 jupyter pyspark notebook 中不可用。

Pyspark 内核配置：

以下是初始化集群的 cmd：

pyspark jupyter google-cloud-dataproc graphframes

2016-11-30T17:35:55.583

0 投票

1 回答

1651 浏览

graphframes - 图框 BFS 问题

我正在测试 graphframes BFS 玩具示例：

我得到的结果是：

这很奇怪，因为范妮和大卫也有外向优势。并且链接到它们的顶点也有出边，例如，结果数据帧不仅应该包含一跳路径，还应该包含来自源顶点的所有路径。

我自己创建了一个玩具图：

当我做同样的查询时：

我仍然只得到一跳邻居。我错过了什么吗？我还测试了其他代表“不等于”的运算符，但没有成功。一个疯狂的猜测：也许当 BFS 再次到达源顶点时（它应该查看它，但不访问它的邻居），它不匹配“toExpr”表达式并中止。

另一个问题：GraphFrames 是有向的，不是吗？为了获得“无向图”，我应该添加倒数边，不是吗？

graphframes

2016-12-02T23:15:53.447

0 投票

0 回答

502 浏览

scala - GraphFrames 中边的选择

我正在使用 Scala 中的 Graph 框架应用 BFS，如何对所选最短路径的边权重求和。我有以下代码：

以上代码的输出为：

但我需要这样的输出：

与上面的示例不同，我的图很大，它实际上可能会返回大量边。

scala apache-spark spark-dataframe graphframes

2016-12-04T10:05:55.663

0 投票

2 回答

551 浏览

scala - 对 Apache-Spark 数据帧中的距离求和

以下代码给出了一个数据框，每列中包含三个值，如下所示。

上述代码的输出如下：

在上面的输出中，我们可以看到每一列都有三个值，它们可以解释如下。

基本上e0,e1和e3是边。我想对每列的第三个元素求和，即添加每条边的距离以获得总距离。我怎样才能做到这一点？

scala apache-spark spark-dataframe graphframes

2016-12-08T15:56:32.030

0 投票

3 回答

2880 浏览

scala - Apache-Spark 图形框架中的 SBT

我有以下 SBT 文件，我正在使用 Apache GraphFrame 编译 Scala 代码并读取 CSV 文件。

这是我在 Scala 中的代码

当我尝试使用 SBT 制作 Jar 文件时，在编译过程中出现以下错误

scala apache-spark sbt graphframes

2016-12-12T14:06:26.050

0 投票

1 回答

1543 浏览

scala - Apache-Spark Graph-frame 在 BFS 上非常慢

我在以下代码中使用使用 Scala 的 Apache Spark-GraphFrames，我在上面的代码中应用 BFS 并尝试找到顶点 0 到 100 之间的距离。

源节点：0 目标节点：100

顶点列表如下

这是边缘列表

但上述代码的问题是，仅执行 0 到 100 个顶点就需要大量时间，因为它运行了 4 个小时但没有输出。以上代码我在具有 12 GB RAM 的单机上运行。

您能否指导我加快和优化代码。

scala apache-spark graph breadth-first-search graphframes

2016-12-19T17:06:03.727

1 2 3 4 5 6 7 8 9 10

问题标签 [graphframes]

Reference