scalaz-stream - 如何使用 Sink 提高代码的性能？

Question

我对 scalaz-streams sinks 有奇怪的观察。他们工作缓慢。有谁知道这是为什么？有什么方法可以提高性能吗？

这是我的代码的相关部分：没有接收器的版本

//p is parameter with type p: Process[Task, Pixel]

def printToImage(img: BufferedImage)(pixel: Pixel): Unit = {
  img.setRGB(pixel.x, pixel.y, 1, 1, Array(pixel.rgb), 0, 0)
}
val image = getBlankImage(2000, 4000)
val result = p.runLog.run
result.foreach(printToImage(image))

这需要大约 7 秒来执行

带水槽的版本

//p is the same as before

def printToImage(img: BufferedImage)(pixel: Pixel): Unit = {
  img.setRGB(pixel.x, pixel.y, 1, 1, Array(pixel.rgb), 0, 0)
}

//I've found that way of doing sink in some tutorial
def getImageSink(img: BufferedImage): Sink[Task, Pixel] = {
  //I've tried here Task.delay and Task.now with the same results
  def printToImageTask(img: BufferedImage)(pixel: Pixel): Task[Unit] = Task.delay {
    printToImage(img)(pixel)
  }
  Process.constant(printToImageTask(img))
}



val image = getBlankImage(2000, 4000)
val result = p.to(getImageSink(image)).run.run

这需要 33 秒来执行。由于存在显着差异，我在这里完全感到困惑。

score 7 · Accepted Answer

在第二种情况下，您为每个像素分配 Task，而不是直接调用 printToImage，而是通过 Task 来完成，这是调用链中的更多步骤。

我们经常使用 scalaz-stream，但我坚信用它来解决这类问题有点过头了。在 Process/Channel/Sink 中运行的代码应该比简单的变量分配/更新复杂得多。

我们使用接收器将数据从流写入数据库（Cassandra），并且我们使用批处理，写入单个行的开销很高。Process/Sinks 是超级方便的抽象，但适用于更高级的工作流。当编写 for-loop 很容易时，我建议编写 for-loop。

scalaz-stream - 如何使用 Sink 提高代码的性能？

1 回答 1

Related

Reference