你可以试试下面的代码。它计算所有状态的不同 OrderNo 的数量。我希望它有所帮助。
val rawDF = Seq(
("123", "Completed", "Pending", "Pending"),
("456", "Rejected", "Completed", "Completed"),
("789", "Pending", "In Progress", "Completed")
).toDF("OrderNo", "Status1", "Status2", "Status3")
val newDF = rawDF.withColumn("All_Status", array($"Status1", $"Status2", $"Status3"))
.withColumn("Status", explode($"All_Status"))
.groupBy("Status").agg(size(collect_set($"OrderNo")).as("DistOrderCnt"))
这是结果。(注意:In Progress 仅在测试数据中出现一次。)
+-----------+------------+
| Status|DistOrderCnt|
+-----------+------------+
| Completed| 3|
|In Progress| 1|
| Pending| 2|
| Rejected| 1|
+-----------+------------+