Problem: I have a number of file uploads coming via HTTP in parallel ( uploads receiver ). I'm storing them temporarily on a local disk. Another process ( uploads submitter ) gets notified about new uploads and does specific processing ( parsing, extracting metadata, uploading to S3 etc ). Once upload processing done I want uploads receiver to be notified by submitter to reply back with status ( whether submission is ok or error ) to the remote uploader. Using ZeroMQ PUB/SUB pattern, what would be better:

  • subscribe all upload receiver threads to a single topic. Each receiver thread would have to filter messages based on upload id or something to find a notification that belongs to it.
  • subscribe each receiver thread to a new topic which represents particular upload. This one seems more reasonable assuming topics are cheap in ZeroMQ, i.e. not much resources is needed to keep them and they can be auto-expired. I expect new uploads to come at dozens of files per second, single upload processing may take up to several seconds so theoretically I can have up to thousand of topics active at the same moment of time. Also I may not always be able to unsubscribe due to various failure modes.

关于使用不同的 ZeroMQ 版本号:

虽然较新的版本可能使用PUB端主题过滤,但早期的 ZeroMQ 版本确实使用SUB端方法,这意味着所有(网络)消息传输流量都转到所有SUB-s 作为分配处理工作负载的可接受惩罚,即否则需要在PUB-side 以尽可能低的延迟进行处理。



关于可扩展性范围 - 限制仍然比您的用例更远:

正如 Martin Sustrik(ZeroMQ 的共同父亲)详细介绍的那样,ZeroMQ 的设计预期规模可达几万:

(cit.:) "高效的订阅匹配
在 ZeroMQ 中,简单的尝试用于存储和匹配PUB/SUB订阅。订阅机制旨在用于多达 10,000 个订阅,其中简单的尝试运行良好。但是,有些用户使用多达 150,000,000 个订阅. 在这种情况下,需要更有效的数据结构。”

在 Martin 的这篇文章中可能会发现有关设计和缩放的更多细节。


一个公平的方法是模拟每个有问题的方法并对其进行基准测试,缩放到体外预期静态比例的 { 1.0x , 1.5x , 2.0x , 5.0x },以获得关于实际开销的定量支持数据,与正在审查的替代策略相关的性能和延迟。


