1

我有一个名为的特征,它需要一个类型参数,它的一个方法需要能够创建一个空的类型化数据集。

trait MyTrait[T] {
    val sparkSession: SparkSession
    val spark = sparkSession.session
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        import spark.implicits._ // to access .toDS() function
        // DOESN'T WORK.
        val emptyRDD = sparkContext.parallelize(Seq[T]())
        val accumulator = emptyRDD.toDS()
        ...
    }
}

到目前为止,我还没有让它工作。它抱怨no ClassTag for T,而且value toDS is not a member of org.apache.spark.rdd.RDD[T]

任何帮助,将不胜感激。谢谢!

4

1 回答 1

7

您必须在相同的范围内ClassTag[T]提供两者。Encoder[T]例如:

import org.apache.spark.sql.{SparkSession, Dataset, Encoder}
import scala.reflect.ClassTag


trait MyTrait[T] {
    val ct: ClassTag[T]
    val enc: Encoder[T]

    val sparkSession: SparkSession
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        val emptyRDD = sparkContext.emptyRDD[T](ct)
        spark.createDataset(emptyRDD)(enc)
    }
}

具体实施:

class Foo extends MyTrait[Int] {
   val sparkSession = SparkSession.builder.getOrCreate()
   import sparkSession.implicits._

   val ct = implicitly[ClassTag[Int]]
   val enc = implicitly[Encoder[Int]]
}

可以跳过RDD

import org.apache.spark.sql.{SparkSession, Dataset, Encoder}

trait MyTrait[T] {
    val enc: Encoder[T]

    val sparkSession: SparkSession
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        spark.emptyDataset[T](enc)
    }
}

检查如何将特征声明为采用隐式“构造函数参数”?,具体由BlaisorbladeAlexey Romanov回答

于 2017-12-05T00:30:20.717 回答