0

I'm writing methods in Scala that take in Column arguments and return a column. Within them, I'm looking to compare the value of the columns (ranging from integers to dates) using logic similar to the below, but have been encountering an error message.

The lit() is for example purposes only. In truth I'm passing columns from a DataFrame.select() into a method to do computation. I need to compare using those columns.

val test1 = lit(3)
val test2 = lit(4)

if (test1 > test2) { 
    print("tuff")
}

Error message.

Error : <console>:96: error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: Boolean
       if (test1 > test2) {

What is the correct way to compare Column objects in Spark? The column documentation lists the > operator as being valid for comparisons.

Edit: Here's a very contrived example of usage, assuming the columns passed into the function are dates that need to be compared for business reasons with a returned integer value that also has some business significance.

someDataFrame.select(
   $"SomeColumn", 
   computedColumn($"SomeColumn1", $"SomeColumn2").as("MyComputedColumn")
)

Where computedColumn would be

def computedColumn(col1 : Column, col2: Column) : Column = { 
      val returnCol : Column = lit(0)    
      if (col1 > col2) { 
         returnCol = lit(4)
      } 
} 

Except in actually usage there is a lot more if/else logic that needs to happen in computedColumn, with the final result being a returned Column that will be added to the select's output.

4

1 回答 1

0

You can use when to do a conditional comparison:

someDataFrame.select(
   $"SomeColumn", 
   when($"SomeColumn1" > $"SomeColumn2", 4).otherwise(0).as("MyComputedColumn")
)

If you prefer to write a function:

def computedColumn(col1 : Column, col2: Column) : Column = { 
    when(col1 > col2, 4).otherwise(0)
} 

someDataFrame.select(
   $"SomeColumn", 
   computedColumn($"SomeColumn1", $"SomeColumn2").as("MyComputedColumn")
)
于 2021-02-01T19:39:44.870 回答