为了减少两个 RDD 加入过程中的洗牌,我决定先使用 HashPartitioner 对它们进行分区。这是我的做法。我做得对吗,还是有更好的方法来做到这一点?
val rddA = ...
val rddB = ...
val numOfPartitions = rddA.getNumPartitions
val rddApartitioned = rddA.partitionBy(new HashPartitioner(numOfPartitions))
val rddBpartitioned = rddB.partitionBy(new HashPartitioner(numOfPartitions))
val rddAB = rddApartitioned.join(rddBpartitioned)