0

我正在尝试使用 databrick 连接在 IDE 中从我的 databricks 笔记本中运行一些代码。我似乎无法弄清楚如何创建一个简单的数据框。

使用:

import spark.implicits._

var Table_Count = Seq((cdpos_df.count(),I_count,D_count,U_count)).toDF("Table_Count","I_Count","D_Count","U_Count")

给出错误信息value toDF is not a member of Seq[(Long, Long, Long, Long)]

尝试从头开始创建数据框:

var dataRow = Seq((cdpos_df.count(),I_count,D_count,U_count))

    var schemaRow = List(
      StructField("Table_Count", LongType, true),
      StructField("I_Count", LongType, true),
      StructField("D_Count", LongType, true),
      StructField("U_Count", LongType, true)
    )

    var TableCount = spark.createDataFrame(
      sc.parallelize(dataRow),
      StructType(schemaRow)
    )

给出错误信息

overloaded method value createDataFrame with alternatives:
  (data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
  (rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
  (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
 cannot be applied to (org.apache.spark.rdd.RDD[(Long, Long, Long, Long)], org.apache.spark.sql.types.StructType)
4

1 回答 1

0

使用以下方法组合:

var TableCount = spark.createDataFrame(
      sc.parallelize(dataRow)
      // StructType(schemaRow)
          ).toDF("Table_Count","I_Count","D_Count","U_Count")

摆脱了错误,但我仍然需要稍微构建它......

于 2021-09-14T17:51:46.433 回答