1

我正在寻找以下distinctLastBy方法性能良好的解决方案:

import scala.language.higherKinds
implicit final class SeqPimp[A, S[A] <: Seq[A]](val s: S[A]) extends AnyVal {
  import scala.collection.generic.CanBuildFrom
  import scala.collection.mutable.Builder
  private final def build[B](build: Builder[B, S[B]] => Unit)(implicit cbf: CanBuildFrom[S[A], B, S[B]]): S[B] = {
    val b = cbf()
    build(b)
    b.result
  }
  final def distinctBy[B](f: A => B)(implicit cbf: CanBuildFrom[S[A], A, S[A]]): S[A] = {
    build[A] { builder =>
      val seen = scala.collection.mutable.Set[B]()
      for (a <- s; b = f(a); if !(seen contains b)) {
        seen += b
        builder += a
      }
    }
  }
  final def distinctLastBy[B](f: A => B)(implicit cbf: CanBuildFrom[S[A], A, S[A]]): S[A] = {
    // instead of keeping the first occurence of an element the last one will be kept
    build[A] { builder => builder ++= s.view.reverse.distinctBy(f).reverse }
  }
}

一个例子:

case class Num(integralDigits: Int, fractionalDigits: Int)
val nums = Num(2, 11) :: Num(1, 23) :: Num(1, 45) :: Num(3, 11) :: Num(2, 22) :: Nil
nums distinctLastBy (_.integralDigits) // List(Num(1,45), Num(3,11), Num(2,22))

最好让结果元素按by原始列表中的第一次出现(-argument)排序。

List(Num(2,22), Num(1,45), Num(3,11))

有任何想法吗?

4

2 回答 2

1

如果你的目标是 JVM,那么基于 JVM 的东西java.util.LinkedHashMap呢?

import java.util.LinkedHashMap
import scala.collection.JavaConversions._

final def distinctLastBy[B](f: A => B)(implicit cbf: CanBuildFrom[S[A], A, S[A]]): S[A] = {
  build[A] { builder =>
    val map = new LinkedHashMap[B, A]
    for (a <- s; b = f(a)) {
      map(b) = a
    }
    builder ++= map.values
  }
}

LinkedHashMap 跟踪 LinkedList 中的插入顺序。当然,我们可以在纯 Scala 中自己做同样的事情:

import scala.collection.mutable.ListBuffer

final class Ref[A](var x: A)

final def pureDistinctLastBy[B](f: A => B)(implicit cbf: CanBuildFrom[S[A], A, S[A]]): S[A] = {
  build[A] { builder =>
    var seen = Map.empty[B, Ref[A]]
    val listBuf = ListBuffer.empty[Ref[A]]
    for (a <- s; b = f(a)) {
      seen.get(b) match {
        case Some(ref) => ref.x = a
        case None =>
          val ref = new Ref(a)
          seen += b -> ref
          listBuf += ref
      }
    }
    builder ++= listBuf.view.map(_.x)
  }
}

s 使我们在Ref使用新信息更新列表时不必搜索列表。这些Refs 会让任何函数式编程爱好者感到不安,因此我们可以改为使用Map seen来跟踪列表中项目的位置,而不是存储对它们的引用:

final def functionalDistinctLastBy[B](f: A => B)(implicit cbf: CanBuildFrom[S[A], A, S[A]]): S[A] = {
  build[A] { builder =>
    val (seen, list) = ((Map.empty[B, Int], IndexedSeq.empty[A]) /: s){(acc, a) =>
      val (innerSeen, innerList) = acc
      val b = f(a)
      innerSeen.get(b) match {
        case Some(i) => (innerSeen, innerList.updated(i, a))
        case None => (innerSeen + (b -> innerList.size), innerList :+ a)
      }
    }
    builder ++= list
  }
}

尽管我怀疑它不会像命令式版本那么快。

于 2013-05-17T13:35:48.390 回答
1

如果您想使用构建器保留实现,我只能确认@James_pic 的答案。SortedMap如果您希望最终对密钥进行排序,请考虑使用 a 。

另一个更轻量级的代码可能性是:

nums.groupBy(_.integralDigits).map(_._2.last)
于 2013-05-17T14:48:01.513 回答