3

我正在为 Ruby 中的实时数据分析任务开发一种算法。由于数据集相当大,瓶颈是 CPU。因此,为了达到所需的性能,我必须并行使用更多的内核,可能在不同的机器上。

我的问题是是否存在提供以下功能的现有 Ruby 库:

  • 集群管理,理想情况下是无主的,具有动态重新配置(加入和离开节点)和一定程度的容错
  • 将计算作业分配到(活动)节点,错误处理(作业重试等)
  • 快速(直接?)通信以确保实时功能

我已经看过的东西:

  • DRb:太低级,手动节点处理,没有容错?
  • DCell:成熟?自动集群管理?
  • Resque/Sidekiq:不错,但太慢了(轮询 Redis、休眠的工作人员,...)
  • Riak Map/Reduce:不错,但不推荐用于实时查询
  • Spark:复杂的东西,进取心?

最后的手段:也许对于 Ruby 没有解决方案,但对于其他平台?也许是 Java(是的,JRuby!)或 node.js。

4

1 回答 1

-1

If you're finding yourself with a CPU-bound problem that would benefit from greater scale and greater concurrency, I'd highly recommend checking out the Go language. Concurrency and parallelism aren't Ruby's strong suits, and in my experience trying to make them work is always an uphill battle.

You'll find that with Go, you'll be able to scale out to multiple cores and machines much better, have excellent communication between go-routines, and a really nice concurrency-based router.

For an introduction to concurrency in Go, I'd check out Rob Pike's 'Concurrency Is Not Parallelism' talk.

于 2013-11-13T04:52:19.853 回答