1

快速摘要: 我正在尝试在 Scala 中使用 Vegas-viz 显示来自 Spark DataFrames 的多个直方图。我创建了一个trait来创建不同类型的直方图,并实现了扩展它的类。当我创建一个子类的实例时,我得到一个NullPointerException,这让我觉得某处有一个嵌套的 DataFrame。

有解决方法吗?我错过了什么,错误是别的吗?

详细信息: 这里是trait

trait Histogram {

  val rawdf: DataFrame
  val sparseDim: Seq[String]
  val name: String

  val xColumn: String
  val yColumn: String

  val group: DataFrame

  val plot: ExtendedUnitSpecBuilder = Vegas(name).
    withDataFrame(group).
    encodeX(
      field = xColumn,
      Quantitative,
      scale = Scale(ScaleType.Log),
      title = sparseDim.reduce((a, b) => a + ", " + b)
    ).
    encodeY(field = yColumn, Quantitative).
    mark(Bar)

  def show(): Unit = plot.show

}

这是扩展它的类之一:

class HistogramCount(val rawdf: DataFrame,
                     val sparseDim: Seq[String],
                     val name: String = "Histogram Count") extends Histogram {

  val xColumn = "cube"
  val yColumn = "count"

  override val group: DataFrame = rawdf.
    select("VALUE", sparseDim: _*).
    groupBy(sparseDim.head, sparseDim.tail: _*).
    count().
    withColumnRenamed("count", "cube").
    groupBy("cube").
    count()

}

当我创建子类的实例时,出现以下错误:

Exception in thread "main" java.lang.NullPointerException
at <Pointing to .withDataFrame(group) in the trait>

我想这是因为评估是惰性的,并且在创建时group调用它。.withDataFrame(group)plot

我尝试在group使用 a 调用 plot 之前评估 DataFrame val evaluate: Long = group.rdd.count(),但它不能解决问题。

4

1 回答 1

0

Solved it by making the variable plot lazy. Still not sure if that is the best way thought.

于 2018-09-05T15:18:34.473 回答