pandas - pyspark 错误：不支持 pandas_udf 表达式

翻译自：https://stackoverflow.com/questions/66756262 2021-03-23T02:03:59.663

40 次

代码使用下面的结构来定义 pyspark 数据框（版本 2.4.7）

schema = T.StructType([T.StructField('key', T.StringType(), True),
                    T.StructField('date', T.StringType(), True),
                    T.StructField('values1', T.FloatType(), True),
                    T.StructField('values2', T.FloatType(), True),
                    T.StructField('values3', T.FloatType(), True),
                    ])

窗口函数定义如下

schema = T.StructType([T.StructField("v", T.DoubleType())])
@F.pandas_udf(schema, F.PandasUDFType.GROUPED_MAP)
def exponential_average(v):
    alpha = 0.5
    fit = v.ewm(alpha=alpha).mean()
    return fit

应用窗口函数 pyspark 时会抛出错误：Expression 'exponential_average(values3#508)' not supported within a window function.

window = Window.partitionBy(F.col('key'))\
                    .orderBy('date')\
                    .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
df.withColumn('test', exponential_average(F.col('values3')).over(window))

使用有什么问题F.PandasUDFType.GROUPED_MAP吗？

pandas - pyspark 错误：不支持 pandas_udf 表达式

0 回答 0

Related

Reference