I'm working on a server push service, kind of like pusher/pubnub. The most critical part that handles the client polling is currently using Node.js and Redis, which is working just fine, except for one thing. I don't really have a good idea about what's going on in the app.
The whole idea is based around long polling, which means there is A LOT of requests going back and forth, and a lot of checking in redis and seeing if there's something new. The problem is that I don't really know how to monitor such a thing.
Let's say that there are 10 000 users online on average on a single site. At a polling interval of 5 seconds this would result in at least 2000 log entries every second. How do I manage that many logs? Should I use something like logstash to collect them to have at least some idea of what's going on in the app?
Would it be a good idea to have all instrumentation turned off, and only enable it with something like kill -s USR2
?
I thought about using Redis monitor
command to gather some data, but just running it slows down redis by 50%, not to mention the overhead of analyzing the incoming data.
How do people generally handle this? Is there any good book or something about building high-load & high-availability applications?