0

我有一个通过 scalapb 从 .proto 文件生成的案例类,它有一些 bcl.DateTime 类型的字段。案例类定义如下:

@SerialVersionUID(0L)
final case class EditorialAdEntity(
    customerid: _root_.scala.Int = 0,
    accountid: _root_.scala.Int = 0,
    orderId: _root_.scala.Long = 0L,
    entityId: _root_.scala.Long = 0L,
    dataFeedId: _root_.scala.Long = 0L,
    editorialStatusModifiedDTim: _root_.scala.Option[bcl.bcl.DateTime] = _root_.scala.None,
    modifiedDTim: _root_.scala.Option[bcl.bcl.DateTime] = _root_.scala.None,
    adTitle: _root_.scala.Predef.String = "",
    adDescription: _root_.scala.Predef.String = "",
    adDescription2: _root_.scala.Predef.String = "",
    displayURL: _root_.scala.Predef.String = "",
    businessName: _root_.scala.Predef.String = "",
...

我能够创建这个案例类的一个实例并查看内容如下:

val currentDt: DateTime = DateTime.of(value = Some(DateTimeUtils.getCurrentMillis), kind = Some(DateTimeKind.UTC), scale = Some(TimeSpanScale.MILLISECONDS))
val entity: EditorialAdEntity = EditorialAdEntity(customerid = customerId, accountid = accountId, adTitle = "test",
          orderId = orderId, serviceLevelId = 5, campaignType = campaignType,
          createdDtim = Some(currentDt), modifiedDTim = Some(currentDt),
          editorialStatusModifiedDTim = Some(currentDt) )
        
Logger.logInfo(entity.toProtoString)

但是,当我在此之上创建 Spark 数据框时,如下所示:

val data = spark.sqlContext.createDataFrame(List(entity))
data.show()

我收到以下错误:

Exception in thread "main" scala.ScalaReflectionException: <none> is not a term
    at scala.reflect.api.Symbols$SymbolApi$class.asTerm(Symbols.scala:199)
    at scala.reflect.internal.Symbols$SymbolContextApiImpl.asTerm(Symbols.scala:84)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.constructParams(ScalaReflection.scala:985)
    at org.apache.spark.sql.catalyst.ScalaReflection$.constructParams(ScalaReflection.scala:46)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.getConstructorParameters(ScalaReflection.scala:965)
    at org.apache.spark.sql.catalyst.ScalaReflection$.getConstructorParameters(ScalaReflection.scala:46)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:782)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:724)
    at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:906)
    at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:46)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:723)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:737)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:724)
    at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:906)
    at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:46)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:723)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:785)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:784)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.immutable.List.map(List.scala:285)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:784)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:724)
    at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:906)
    at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:46)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:723)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:737)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:724)
    at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:906)
    at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:46)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:723)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:785)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:784)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.immutable.List.map(List.scala:285)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:784)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:724)
    at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:906)
    at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:46)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:723)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:720)
    at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:313)
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:285)
    at Scripts.SampleScripts.Protobuf.demo.EnforcementProtoTester$.main(EnforcementProtoTester.scala:43)
    at Scripts.SampleScripts.Protobuf.demo.EnforcementProtoTester.main(EnforcementProtoTester.scala)

如果我从 proto 类中删除 DateTime 字段,它似乎工作正常。关于如何在带有 bcl.DateTime 字段的原型类之上创建数据帧的任何指示?

4

1 回答 1

0

为了将 ScalaPB 生成的类与 Spark 一起使用,您需要在 上添加一个库依赖项sparksql-scalapb,并ProtoSQL.createDataFrame()使用spark.sqlContext.createDataFrame. 此处描述了该过程:https ://scalapb.github.io/sparksql.html#using-sparksql-scalapb

于 2020-07-11T02:32:49.283 回答