0

我正在使用 scala 中的地图对象,其中键是篮子 ID,值是篮子中包含的一组项目 ID。目标是获取此地图对象并为每个篮子计算一组其他篮子 ID,其中至少包含一个常见项目。

假设输入地图对象是

val basket = Map("b1" -> Set("i1", "i2", "i3"), "b2" -> Set("i2", "i4"), "b3" -> Set("i3", "i5"), "b4" -> Set("i6"))

是否可以在 spark 中执行计算,以便我得到相交的篮子信息?例如 val intersects = Map("b1" -> Set("b2", "b3"), "b2" -> Set("b1"), "b3" -> Set("b1"), "b4" -> Set())

谢谢!

4

1 回答 1

0

就像是...

val basket = Map("b1" -> Set("i1", "i2", "i3"), "b2" -> Set("i2", "i4"), "b3" -> Set("i3", "i5"), "b4" -> Set("i6"))

def intersectKeys( set : Set[String], map : Map[String,Set[String]] ) : Set[String] = {
  val checks = map.map { case (k, v) =>
    if (set.intersect(v).nonEmpty) Some(k) else None
  }
  checks.collect { case Some(k) => k }.toSet
}

// each set picks up its own key, which we don't want, so we subtract it back out
val intersects = basket.map { case (k,v) => (k, intersectKeys(v, basket) - k) }
于 2020-08-30T05:02:10.990 回答