2

I have spark dataframe and and trying to add Year, Month and Day columns to it. But the problem is after adding the YTD columns it does not keeps the leading zero with the date and month columns.

val cityDF= Seq(("Delhi","India"),("Kolkata","India"),("Mumbai","India"),("Nairobi","Kenya"),("Colombo","Srilanka"),("Tibet","China")).toDF("City","Country")
val dateString = "2020-01-01"
val dateCol = org.apache.spark.sql.functions.to_date(lit(dateString))
val finaldf = cityDF.select($"*", year(dateCol).alias("Year"), month(dateCol).alias("Month"), dayofmonth(dateCol).alias("Day"))

output screenshot

I want to keep the leading zero from the Month and Day columns but it is giving me result as 1 instead of 01.
As I am using year month date columns for the spark partition creation. so I want to keep the leading zeros intact. So my question is: How do I keep the leading zero in my dataframe columns.

4

2 回答 2

3

Integer type can be converted to String type, where leading zeroes are possibe, with "format_string" function:

val finaldf =
  cityDF
    .select($"*",
      year(dateCol).alias("Year"),
      format_string("%02d", month(dateCol)).alias("Month"),
      format_string("%02d", dayofmonth(dateCol)).alias("Day")
    )
于 2020-01-01T18:11:49.693 回答
3

Why not simply use date_format for that?

val finaldf = cityDF.select(
                     $"*", 
                     year(dateCol).alias("Year"), 
                     date_format(dateCol, "MM").alias("Month"), 
                     date_format(dateCol, "dd").alias("Day")
              )
于 2020-01-01T19:18:19.710 回答