我有以下两个DataFrames
:
DataFrame "dfPromotion":
date | store
===================
2017-01-01 | 1
2017-01-02 | 1
DataFrame "dfOther":
date | store
===================
2017-01-01 | 1
2017-01-03 | 1
后来我需要以上union
两个DataFrames
。但在我必须删除所有dfOther
具有date
值的行之前,它也包含在dfPromotion
.
以下filtering
步骤的结果应如下所示:
DataFrame "dfPromotion" (this stays always the same, must not be changed in this step!)
date | store
===================
2017-01-01 | 1
2017-01-02 | 1
DataFrame "dfOther" (first row is removed as dfPromotion contains the date 2017-01-01 in the "date" column)
date | store
===================
2017-01-03 | 1
有没有办法在 Java 中做到这一点?我只找到了这个DataFrame.except
方法,但这会检查 DataFrames 的所有列。我需要仅按column过滤第二个 DataFramedate
,因为稍后可以添加其他列,其中可能包含不同的值...
调用dfOther.filter(dfOther.col("date").isin(dfPromotion.col("date")))
会引发以下异常:
Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) date#64 missing from date#0,store#13 in operator !Filter date#0 IN (date#64);