我使用 Spark2.3.2 和 GraphFrames 0.7.0。
我有两个数据框:node2attrDf
并且edge2attrDf
,生成它们的代码如下:https ://gist.github.com/superPershing/56928c4f5420ea6334d7a9f6e389bda5
他们的架构是这样的:
scala> node2attrDf.printSchema
root
|-- id: integer (nullable = true)
|-- combined: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: integer (containsNull = false)
scala> edge2attrDf.printSchema
root
|-- src: integer (nullable = true)
|-- dst: integer (nullable = true)
|-- info: struct (nullable = false)
| |-- dstNeighbors: array (nullable = true)
| | |-- element: long (containsNull = false)
| |-- J: array (nullable = true)
| | |-- element: integer (containsNull = false)
| |-- q: array (nullable = true)
| | |-- element: double (containsNull = false)
scala> node2attrDf.show(5)
+---+--------------------+
| id| combined|
+---+--------------------+
|148|[[405, 3], [121, ...|
|463|[[131, 2], [213, ...|
|471|[[117, 7], [7, 6]...|
|496|[[134, 7], [127, ...|
|833|[[597, 4], [566, ...|
+---+--------------------+
only showing top 5 rows
scala> edge2attrDf.show(5)
+---+---+------------+
|src|dst| info|
+---+---+------------+
|780|725|[[], [], []]|
|266|351|[[], [], []]|
|285|132|[[], [], []]|
|328|748|[[], [], []]|
|275|487|[[], [], []]|
+---+---+------------+
only showing top 5 rows
当我使用两个数据框创建新的图框时:
val gDF = GraphFrame(node2attrDf, edge2attrDf)
发生错误:
scala> val gDF = GraphFrame(node2attrDf, edge2attrDf)
<console>:31: error: type mismatch;
found : org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame
(which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
required: org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame
(which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
val gDF = GraphFrame(node2attrDf, edge2attrDf)
^
<console>:31: error: type mismatch;
found : org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame
(which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
required: org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame
(which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
val gDF = GraphFrame(node2attrDf, edge2attrDf)
^
似乎找到的类型和所需的类型是相同的。那么为什么会发生这个错误以及如何解决呢?