4

I've been trying to implement a simple long polling service for use in my own projects and maybe release it as a SAAS if I succeed. These are the two approaches I've tried so far, both using Node.js (polling PostgreSQL in the back).

1. Periodically check all the clients in the same interval

Every new connection is pushed onto a queue of connections, which is being walked through in an interval.

var queue = [];

function acceptConnection(req, res) {
  res.setTimeout(5000);
  queue.push({ req: req, res: res });
}

function checkAll() {
  queue.forEach(function(client) {
    // respond if there is something new for the client
  });
}

// this could be replaced with a timeout after all the clients are served
setInterval(checkAll, 500);

2. Check each client at a separate interval

Every client gets his own ticker which checks for new data

function acceptConnection(req, res) {
  // something which periodically checks data for the client
  // and responds if there is anything new
  new Ticker(req, res);
}

While this keeps the minimum latency for each client lower, it also introduces overhead by setting a lot of timeouts.

Conclusion

Both of these approaches solve the problem quite easily, but I don't feel that this will scale up easily to something like 10 million open connections, especially since I'm polling the database on every check for every client.

I thought about doing this without the database and just immediately broadcast new messages to all open connections, but that will fail if a client's connection dies for a few seconds while the broadcast is happening, because it is not persistent. Which means I basically need to be able to look up messages in history when the client polls for the first time.

I guess one step up here would be to have a data source where I can subscribe to new data coming in (CouchDB change notifications?), but maybe I'm missing something in the big picture here?

What is the usual approach for doing highly scalable long polling? I'm not specifically bound to Node.js, I'd actually prefer any other suggestion with a reasoning why.

4

2 回答 2

0

不确定这是否能回答您的问题,但我喜欢PushPin的方法(+概念解释)。

我喜欢这个想法(使用反向代理并与返回代码通信 + 延迟的 REST 返回请求),但我确实对实现有所保留。我可能低估了这个问题,但在我看来,所使用的技术有点矫枉过正。不确定我是否会使用它,我更喜欢更轻量级的解决方案,但我发现这个概念非常棒。

很想听听你最终使用了什么。

于 2013-05-15T09:45:04.030 回答
-1

既然你提到了可扩展性,我必须得到一点理论,因为唯一实际的措施是负载测试。因此,我能提供的只是建议。

一般来说,一次一次的事情不利于可扩展性。尤其是每次连接一次或一次请求一次,因为这会使您的应用程序的一部分与流量成正比。Node.js 使用其单线程异步 I/O 模型移除了每个连接的线程依赖项。当然,您不能完全消除每个连接的某些内容,例如请求和响应对象以及套接字。

我建议避免为每个 HTTP 连接打开数据库连接的任何东西。这就是连接池的用途。

至于在上面的两个选项之间进行选择,我个人会选择第二个选择,因为它可以使每个连接保持隔离。第一个选项使用连接循环,这意味着每个连接的实际执行时间。考虑到 I/O 是异步的,这可能没什么大不了的,但是如果在每个连接的迭代和每个连接的对象的存在之间进行选择,我宁愿只拥有一个对象。然后,当突然有 10,000 个连接时,我就不用担心了。

C10K 问题似乎是一个很好的参考,尽管老实说这真的是个人判断。

http://www.kegel.com/c10k.html

http://en.wikipedia.org/wiki/C10k_problem

于 2013-03-03T22:01:35.187 回答