我有一个由以下数据组成的数据框
val df = List(
(1,"wwe",List(1,2,3)),
(2,"dsad",List.empty),
(3,"dfd",null)).toDF("id","name","value")
df.show
+---+----+---------+
| id|name| value|
+---+----+---------+
| 1| wwe|[1, 2, 3]|
| 2|dsad| []|
| 3| dfd| null|
+---+----+---------+
为了分解数组列值,我使用了以下逻辑
def explodeWithNull(f:StructField): Column ={
explode(
when(
col(f.name).isNotNull, col(f.name)
).otherwise(
f.dataType.asInstanceOf[ArrayType].elementType match{
case StringType => array(lit(""))
case DoubleType => array(lit(0.0))
case IntegerType => array(lit(0))
case _ => array(lit(""))
}
)
)
}
def explodeAllArraysColumns(dataframe: DataFrame): DataFrame = {
val schema: StructType = dataframe.schema
val arrayFileds: Seq[StructField] = schema.filter(f => f.dataType.typeName == "array")
arrayFileds.foldLeft(dataframe) {
(df: DataFrame, f: StructField) => df.withColumn(f.name,explodeWithNull(f))
}
}
explodeAllArraysColumns(df).show
+---+----+-----+
| id|name|value|
+---+----+-----+
| 1| wwe| 1|
| 1| wwe| 2|
| 1| wwe| 3|
| 3| dfd| 0|
+---+----+-----+
以这种方式爆炸我错过了df中的空数组行。理想情况下,我不想错过那一行,我想要一个空值或爆炸数据框中该列的默认值。如何实现这一点?