distributed-computing - 如何在 Storm 中构建容错应用程序？

Question

问题的简短版本：如何在 Twitter Storm 中构建一个故障安全的字数统计程序（拓扑），即使发生故障也能产生准确的结果？这甚至可能吗？

长版：我正在研究 Twitter Storm 并试图了解它应该如何使用。我按照教程进行操作，发现它是一个非常简单的概念。但是本教程中概述的字数统计示例不是容错的（因为螺栓将一些数据保存在内存中）。但是，如果将事件重新提交到链的开头（当某些螺栓失败时会发生这种情况），则将相同的数据保存在后端数据库中会导致重复计算。

我是否应该将 Twitter Storm 视为生成部分准确结果的实时平台，并且仍然依赖 MapReduce 来获得准确的结果？

score 2 · Accepted Answer

这真的取决于你试图对抗什么样的失败。您可以做几件事：

Storm bolts 应该只在处理完一个元组后才确认它。如果你编写你的 spout、bolts 和拓扑来使用它，你可以实现一个“exactly one time”系统来保证准确性。
Kafka 是一种将数据放入 Storm 的好方法，因为它使用磁盘持久性将消息保留很长时间，即使在它们被消耗后也是如此。这意味着如果消费者失败，您可以检索它们。

但总的来说，很难保证在任何流系统中都只处理一次。这是一个已知问题，是一个很难有效解决的问题。

score 0 · Accepted Answer

Storm has the concept of transactional topologies. In practice, this means you will want to process items in batches, then commit to your database at the end of the batch, storing the transaction ID in the database alongside a count. This also has the practical benefit of reducing the load on your database with fewer inserts.

Batches are processed in parallel and may be replayed on failure, but are guaranteed to be committed in order. This is important because it makes it safe to write code that fetches the current count row, checks the transaction ID against the one in memory, and if the two differ (meaning it is an uncommitted batch), adding the new count to the existing one and committing that updated count.

See the following link for much more information and code examples:

https://github.com/nathanmarz/storm/wiki/Transactional-topologies

distributed-computing - 如何在 Storm 中构建容错应用程序？

2 回答 2

Related

Reference