Small brainstorming here.
I search the most suitable solution for a distributed storage solution. I look for a efficient key/value storage, flat namespace, with minimum latency.
Scenario
I plan to save small blob records, 1Ko or less. They are mostly produced/consumed records:
- 1 write
- 1 read, more on rare cases.
- delete, after several month for archive.
However some records may grow up to 10Mb, it's the maximum but must be possible.
The data must be serialize on disk.
Important
My first one priority is a storage that can provide good response time on a really huge list of file, may be several hundred of millions.
Of course, with this number, I don't care about iterating over my files (I look for the functionality but don't care about performance, only for debug or maintenance).
And of course a solution that scale, without SPOF only better.
Must be Linux solutions and no Cloud allowed (private data).
What I found yet
I looked at Voldemort, Cassandra and HBase.
- I'm afraid that Cassandra and HBase are not really efficient for blob record.
- Voldemort looks still immature and I can't find information about record size and number of files supported.
I check also Lustre and Ceph, but they're not key/value store.
CouchBase and MongoDB have terrible performance with persistence activated.
I'm running some tests but can't really launch solid benchmark just yet. If someone have some information about this solutions or know another product design for such workload?