performance - scala 隐式性能

Question

这经常出现。使用泛型编码的函数在 scala 中速度明显较慢。请参见下面的示例。类型特定版本的执行速度比通用版本快 1/3。考虑到通用组件在昂贵的循环之外，这是双重令人惊讶的。对此有已知的解释吗？

  def xxxx_flttn[T](v: Array[Array[T]])(implicit m: Manifest[T]): Array[T] = {
    val I = v.length
    if (I <= 0) Array.ofDim[T](0)
    else {
      val J = v(0).length
      for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
      val flt = Array.ofDim[T](I * J)
      for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
      flt
    }
  }
  def flttn(v: Array[Array[Double]]): Array[Double] = {
    val I = v.length
    if (I <= 0) Array.ofDim[Double](0)
    else {
      val J = v(0).length
      for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
      val flt = Array.ofDim[Double](I * J)
      for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
      flt
    }
  }

score 5 · Accepted Answer

您无法真正说出您在这里测量的内容 - 无论如何都不是很好 - 因为for循环不如纯while循环快，并且内部操作非常便宜。如果我们用 while 循环重写代码——关键的双重迭代是

 var i = 0
  while (i<I) {
    var j = 0
    while (j<J) {
      flt(i * J + j) = v(i)(j)
      j += 1
    }
    i += 1
  }
  flt

然后我们看到通用情况的字节码实际上是截然不同的。非通用：

133:    checkcast   #174; //class "[D"
136:    astore  6
138:    iconst_0
139:    istore  5
141:    iload   5
143:    iload_2
144:    if_icmpge   191
147:    iconst_0
148:    istore  4
150:    iload   4
152:    iload_3
153:    if_icmpge   182
// The stuff above implements the loop; now we do the real work
156:    aload   6
158:    iload   5
160:    iload_3
161:    imul
162:    iload   4
164:    iadd
165:    aload_1
166:    iload   5
168:    aaload             // v(i)
169:    iload   4
171:    daload             // v(i)(j)
172:    dastore            // flt(.) = _
173:    iload   4
175:    iconst_1
176:    iadd
177:    istore  4
// Okay, done with the inner work, time to jump around
179:    goto    150
182:    iload   5
184:    iconst_1
185:    iadd
186:    istore  5
188:    goto    141

这只是一堆跳转和低级操作（daload 和 dastore 是从数组加载和存储双精度的关键操作）。如果我们查看通用字节码的关键内部部分，它看起来像

160:    getstatic   #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
163:    aload   7
165:    iload   6
167:    iload   4
169:    imul
170:    iload   5
172:    iadd
173:    getstatic   #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
176:    aload_1
177:    iload   6
179:    aaload
180:    iload   5
182:    invokevirtual   #107; //Method scala/runtime/ScalaRunTime$.array_apply:(Ljava/lang/Object;I)Ljava/lang/Object;
185:    invokevirtual   #111; //Method scala/runtime/ScalaRunTime$.array_update:(Ljava/lang/Object;ILjava/lang/Object;)V
188:    iload   5
190:    iconst_1
191:    iadd
192:    istore  5

如您所见，它必须调用方法来应用和更新数组。字节码是一堆乱七八糟的东西，比如

2:   aload_3 
3:   instanceof      #98; //class "[Ljava/lang/Object;"
6:   ifeq    18
9:   aload_3   
10:  checkcast       #98; //class "[Ljava/lang/Object;"
13:  iload_2
14:  aaload 
15:  goto    183
18:  aload_3
19:  instanceof      #100; //class "[I"
22:  ifeq    37
25:  aload_3   
26:  checkcast       #100; //class "[I"
29:  iload_2
30:  iaload 
31:  invokestatic    #106; //Method scala/runtime/BoxesRunTime.boxToInteger:
34:  goto    183
37:  aload_3
38:  instanceof      #108; //class "[D"
41:  ifeq    56
44:  aload_3   
45:  checkcast       #108; //class "[D"
48:  iload_2
49:  daload 
50:  invokestatic    #112; //Method scala/runtime/BoxesRunTime.boxToDouble:(
53:  goto    183

基本上必须测试每种类型的数组，如果它是您要查找的类型，则将其装箱。Double 非常接近前面（10 个中的第 3 个），但它仍然是一个相当大的开销，即使 JVM 可以识别出代码最终是装箱/拆箱，因此实际上不需要分配内存。（我不确定它可以做到这一点，但即使它可以解决问题。）

那么该怎么办？您可以尝试 [@specialized T]，它将为您扩展十倍的代码，就像您自己编写每个原始数组操作一样。不过，专业化在 2.9 中存在缺陷（在 2.10 中应该更少），因此它可能无法按您希望的方式工作。如果速度至关重要——那么，首先，编写 while 循环而不是 for 循环（或者至少使用 -optimise 进行编译，这有助于将 for 循环提高两倍左右！），然后考虑专门化或编写为您需要的类型手动编码。

score 5 · Accepted Answer

这是由于装箱，当您将泛型应用于原始类型并使用包含数组（或在方法签名中显示为普通类型或作为成员的类型）时。

例子

在以下 trait 中，编译后，该process方法将采用已擦除的Array[Any].

trait Foo[A]{
  def process(as: Array[A]): Int
}

如果您选择A成为值/原始类型，Double则必须将其装箱。当以非通用方式（例如 with A=Double）编写 trait 时，process编译为采用Array[Double]JVM 上的不同数组类型。这更有效，因为为了将 a 存储Double在中Array[Any]，必须将 aDouble包装（装箱）到一个对象中，对该对象的引用存储在数组中。特殊Array[Double]可以将其Double作为 64 位值直接存储在内存中。

-`@specialized`注解

如果你觉得冒险，你可以试试这个@specialized关键字（它有很多错误并且经常使编译器崩溃）。这使得scalac为所有或选定的原始类型编译一个类的特殊版本。get(a: A)仅当类型参数在类型签名（但不是get(as: Seq[A])）中显示为普通类型或作为的类型参数时，这才有意义Array。如果专业化毫无意义，我认为您会收到警告。

performance - scala 隐式性能

2 回答 2

例子

-@specialized注解

Related

Reference

-`@specialized`注解