database - Strategy for mass storage of small files

Question

What is the good strategy for mass storage for millions of small files (~50 KB on average) with auto-pruning of files older than 20 minutes? I need to write and access them from the web server.

I am currently using ext4, and during delete (scheduled in cron) HDD usage spikes up to 100% with [flush-8:0] showing up as the process that creates the load. This load is interferes with other applications on the server. When there are no deletes, max HDD utilisation is 0-5%. Situation is same with nested and non-nested directory structures. The worst part is that it seems that mass-removing during peak load is slower than the rate of insertions, so amount of files that need to be removed grows larger and larger.

I have tried changing schedulers (deadline, cfq, noop), it didn't help. I have also tried setting ionice to removing script, but it didn't help either.

I have tried GridFS with MongoDB 2.4.3 and it performs nicely, but horrible during mass delete of old files. I have tried running MongoDB with journaling turned off (nojournal) and without write confirmation for both delete and insert (w=0) and it didn't help. It only works fast and smooth when there are no deletes going on.

I have also tried storing data in MySQL 5.5, in BLOB column, in InnoDB table, with InnoDB engine set to use innodb_buffer_pool=2GB, innodb_log_file_size=1GB, innodb_flush_log_on_trx_commit=2, but the perfomance was worse, HDD load was always at 80%-100% (expected, but I had to try). Table was only using BLOB column, DATETIME column and CHAR(32) latin1_bin UUID, with indexes on UUID and DATETIME columns, so there was no room for optimization, and all queries were using indexes.

I have looked into pdflush settings (Linux flush process that creates the load during mass removal), but changing the values didn't help anything so I reverted to default.

It doesn't matter how often I run auto-pruning script, each 1 second, each 1 minute, each 5 minutes, each 30 minutes, it is disrupting server significantly either way.

I have tried to store inode value and when removing, remove old files sequentially by sorting them with their inode numbers first, but it didn't help.

Using CentOS 6. HDD is SSD RAID 1.

What would be good and sensible solution for my task that will solve auto-pruning performance problem?

score 2 · Accepted Answer

如果大量删除数百万个文件导致性能问题，您可以通过一次“删除”所有文件来解决此问题。您可以创建一个新的（空的）文件系统来代替旧的，而不是使用任何文件系统操作（如“删除”或“截断”）。

要实现这个想法，您需要将驱动器分成两个（或更多）分区。在一个分区已满（或 20 分钟后）后，您开始写入第二个分区，同时将第一个分区用于只读。再过 20 分钟后，您卸载第一个分区，在其上创建空文件系统，再次挂载它，然后开始写入第一个分区，同时将第二个分区用于只读。

最简单的解决方案是只使用两个分区。但是这样你就不能非常有效地使用磁盘空间：你可以在同一个驱动器上存储的文件少两倍。使用更多分区，您可以提高空间效率。

如果由于某种原因您需要将所有文件放在一个位置，请使用tmpfs将链接存储到每个分区上的文件。这需要从中大量删除数百万个链接tmpfs，但这缓解了性能问题，因为应该只删除链接，而不是文件内容；这些链接也只能从 RAM 中删除，而不是从 SSD 中删除。

score 2 · Accepted Answer

删除是一种性能问题，因为数据和元数据都需要在磁盘上销毁。

它们真的需要是单独的文件吗？旧文件真的需要删除吗，或者如果它们被覆盖就可以了吗？

如果第二个问题的答案是“否”，试试这个：

保留按年龄大致排序的文件列表。也许按文件大小分块。
当您想写入新文件时，请找到一个最好比您要替换的文件大的旧文件。与其删除旧文件，不如将truncate()其调整到适当的长度，然后覆盖其内容。确保更新旧文件列表。
清理那些不时被明确替换的真正旧的东西。
对这些文件进行索引可能是有利的。尝试使用tmpfs完整的符号链接到真实文件系统。

通过将文件分块到可管理大小的子目录中，您可能会或可能不会在此方案中获得性能优势。

如果您对同一文件中的多个内容感到满意：

通过将每个文件作为偏移量存储到一个大小相似的文件数组中，将大小相似的文件放在一起。如果每个文件都是 32k 或 64k，则保留一个充满 32k 块的文件和一个充满 64k 块的文件。如果文件是任意大小，则向上取整到 2 的下一个幂。
您可以在此处通过跟踪每个文件的陈旧程度来执行延迟删除。如果您正在尝试写入并且某些内容已过时，请覆盖它而不是附加到文件末尾。

truncate()另一个想法：通过将所有文件按 inode 顺序设置为长度 0，然后对它们进行 ing，您是否获得了性能优势unlink()？无知使我不知道这是否真的有帮助，但似乎它会使数据归零在一起，并且元数据类似地写入在一起。

还有一个想法：XFS 的写入顺序模型比带有data=ordered. 在 XFS 上是否足够快？

database - Strategy for mass storage of small files

2 回答 2

Related

Reference