1

我在 pyspark 中有一个数据框,其结构如下:

DataFrame[Urlaubdate: string, Vacationdate: date, Datensatz: string, Jobname: string]

现在,我想通过比较vacationdate 和urlaubdate 来过滤数据框,不幸的是它们有不同的数据类型。我想过滤假期日期大于 Urlaubdate 的行。你知道怎么做吗?

4

1 回答 1

2

我认为在这种情况下,您必须使用用户定义的函数,如下所示:

from pyspark.sql.functions import udf
from pyspark.sql.types import BooleanType

def compare(urlaubdate, vacationdate):
    # do your comparison here (cast types if necessary)
    # return True or False

# define a udf out of your function
compare_udf = udf(compare, BooleanType())

# filter your dataframe based on it
df_filtered = df.filter(compare_udf(df.urlaubdate, df.vacationdate))
于 2015-11-10T17:58:41.737 回答