We have a huge Redis database containing about 100 million keys, which maps phone numbers to hashes of data.
Once in a while all this data needs to be aggregated and saved to an SQL database. During aggregation we need to iterate over all the stored keys, and take a look at those arrays.
Using Redis.keys
is not a good option because it will retrieve and store the whole list of keys in memory, and it take a loooong time to complete. We need something that will give back an enumerator that can be used to iterate over all the keys, like so:
redis.keys_each { |k| agg(k, redis.hgetall(k)) }
Is this even possible with Redis?
This would prevent Ruby from constructing an array of 100 million elements in memory, and would probably be way faster. Profiling shows us that using the Redis.keys
command makes Ruby hog the CPU at 100%, but the Redis process seems to be idle.
I know that using keys is discouraged against building a set from the keys, but even if we construct a set out of the keys, and retrieve that using smembers, we'll be having the same problem.