I'm looking for a database matching these criteria:
- May be non-persistent;
- Almost all keys of DB need to be updated once in 3-6 hours (100M+ keys with total size of 100Gb)
- Ability to quickly select data by key (or Primary Key)
- This needs to be a DBMS (so LevelDB doesn't fit)
- When data is written, DB cluster must be able to serve queries (single nodes can be blocked though)
- Not in-memory – our dataset will exceed the RAM limits
- Horizontal scaling and replication
- Support full rewrite of all data (MongoDB doesn't clear space after deleting data)
- C# and Java support
Here's my process of working with such database: We've got an analytics cluster that produces 100M records (50GB) of data every 4-6 hours. The data is a "key - array[20]". This data needs to be distributed to users through a front-end system with a rate of 1-10k requests per second. In average, only ~15% of the data is requested, the rest of it will be rewritten in 4-6 hours when the next data set is generated.
What i tried:
- MongoDB. Datastorage overhead, high defragmentation costs.
- Redis. Looks perfect, but it's limited with RAM and our data exceeds it.
So the question is: is there anything like Redis, but not limited with RAM size?