filesystems - Distributed storage for a lot of files

Question

Small brainstorming here.

I search the most suitable solution for a distributed storage solution. I look for a efficient key/value storage, flat namespace, with minimum latency.

Scenario

I plan to save small blob records, 1Ko or less. They are mostly produced/consumed records:

1 write
1 read, more on rare cases.
delete, after several month for archive.

However some records may grow up to 10Mb, it's the maximum but must be possible.

The data must be serialize on disk.

Important

My first one priority is a storage that can provide good response time on a really huge list of file, may be several hundred of millions.

Of course, with this number, I don't care about iterating over my files (I look for the functionality but don't care about performance, only for debug or maintenance).

And of course a solution that scale, without SPOF only better.

Must be Linux solutions and no Cloud allowed (private data).

What I found yet

I looked at Voldemort, Cassandra and HBase.

I'm afraid that Cassandra and HBase are not really efficient for blob record.
Voldemort looks still immature and I can't find information about record size and number of files supported.

I check also Lustre and Ceph, but they're not key/value store.

CouchBase and MongoDB have terrible performance with persistence activated.

I'm running some tests but can't really launch solid benchmark just yet. If someone have some information about this solutions or know another product design for such workload?

score 0 · Accepted Answer

您是否看过Infinispan或Hazelcast等内存数据网格？它们具有出色的可扩展性并且响应迅速，但是如果有一天您会考虑对这些条目进行任何处理，那么存储 10Mb 的对象可能会成为问题。然而，例如 Hazelcast 允许在拥有目标条目的同一集群成员上执行任务，从而减少成员间数据流的数量。

filesystems - Distributed storage for a lot of files

Scenario

Important

What I found yet

1 回答 1

Related

Reference