我有 2 个 pyspark 数据框,
i
+---+-----+
| ID|COL_A|
+---+-----+
| 1| 123|
| 2| 456|
| 3| 111|
| 4| 678|
+---+-----+
j
+----+-----+
|ID_B|COL_B|
+----+-----+
| 2| 456|
| 3| 111|
| 4| 876|
+----+-----+
我试图根据特定列的值进行减法i
,即 of 中存在的值不应存在于of中。j
COL_A
i
COL_B
j
预期输出应该是,
diff
+---+-----+
| ID|COL_A|
+---+-----+
| 1| 123|
| 4| 678|
+---+-----+
这是我的代码,
common = i.join(j.withColumnRenamed('COL_B', 'COL_A'), ['COL_A'], 'leftsemi')
diff = i.subtract(common)
diff.show()
但是输出出错了,
diff
+---+-----+
| ID|COL_A|
+---+-----+
| 2| 456|
| 1| 123|
| 4| 678|
| 3| 111|
+---+-----+
我在这里做错了吗?提前致谢。