2

如果我们想定义一个包含单个对象的案例类,比如一个元组,我们可以很容易地做到:

sealed case class A(x: (Int, Int))

在这种情况下,检索“x”值将花费一小段常数时间,而这个类将只占用一小段常数空间,无论它是如何创建的。

现在,假设我们想要保存一个值序列;我们可以这样:

sealed final case class A(x: Seq[Int])

这似乎和以前一样有效,只是现在存储和读取所有 x 的时间与 x.length 成正比。

然而,事实并非如此,因为有人可以这样做:

val hugeList = (1 to 1000000000).toList
val a = A(hugeList.view.filter(_ == 500000000))

在这种情况下,a 对象看起来像一个在序列中保存单个 int 的无辜案例类,但实际上它需要千兆字节的内存,并且每次访问该单个元素都需要几秒钟的时间。

这可以通过将 List[T] 之类的东西指定为类型而不是 Seq[T] 来解决;然而,这看起来很难看,因为它添加了对特定实现的引用,而实际上其他表现良好的实现,如 Vector[T],也可以。

另一个令人担忧的问题是可以传递一个可变的 Seq[T],因此似乎至少应该使用 immutable.Seq 而不是 scala.collection.Seq(尽管编译器目前实际上不能强制执行不变性)。

查看大多数库,似乎常见的模式是使用 scala.collection.Seq[T],但这真的是个好主意吗?

或者也许使用 Seq 只是因为它是最短的类型,实际上最好使用 immutable.Seq[T]、List[T]、Vector[T] 或其他东西?

编辑中添加的新文本

查看类库,一些最核心的功能,如 scala.reflect.api.Trees 实际上确实使用 List[T],并且通常使用具体类似乎是个好主意。

但是,为什么要使用 List 而不是 Vector?

Vector 的长度为 O(1)/O(log(n)),具有前置、附加和随机访问,渐近地更小(由于 vtable 和 next 指针,List 大约大 3-4 倍),并且支持缓存高效和并行计算, 而 List 除了 O(1) 前置之外没有任何这些属性。

所以,我个人倾向于 Vector[T] 是图书馆数据结构中公开的东西的正确选择,尽管它似乎不太受欢迎,但人们不知道图书馆用户需要什么操作。

4

1 回答 1

1

First of all, you talk both about space and time requirements. In terms of space, your object will always be as large as the collection. It doesn't matter whether you wrap a mutable or immutable collection, that collection for obvious reasons needs to be in memory, and the case class wrapping it doesn't take any additional space (except its own small object reference). So if your collection takes "gigabytes of memory", that's a problem of your collection, not whether you wrap it in a case class or not.

You then go on to argue that a problem arises when using views instead of eager collections. But again the question is what the problem actually is? You use the example of lazily filtering a collection. In general running a filter will be an O(n) operation just as if you were iterating over the original list. In that example it would be O(1) for successive calls if that collection was made strict. But that's a problem of the calling site of your case class, not the definition of your case class.

The only valid point I see is with respect to mutable collections. Given the defining semantics of case classes, you should really only use effectively immutable objects as arguments, so either pure immutable collections or collections to which no instance has any more write access.

There is a design error in Scala in that scala.Seq is not aliased to collection.immutable.Seq but a general seq which can be either mutable or immutable. I advise against any use of unqualified Seq. It is really wrong and should be rectified in the Scala standard library. Use collection.immutable.Seq instead, or if the collection doesn't need to be ordered, collection.immutable.Traversable.

So I agree with your suspicion:

Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?

No! Not good. It might be convenient, because you can pass in an Array for example without explicit conversion, but I think a cleaner design is to require immutability.

于 2013-03-29T22:23:26.080 回答