java - 如何在磁盘上传播/散列多个文件而不在每个目录存储超过 1000 个文件？

Question

我需要生成大量由整数索引的文件。例如，假设 int i 的范围从 0 到 10000，生成的文件将是：f0.xml, f1.xml, ... f10000.xml

但是，对于每个文件夹超过 1000 个文件，文件系统性能开始下降。

我想在 i 的值上使用“散列”或“扩展”函数将文件存储在目录中。此函数将在单个目录中存储不超过 1000 个文件（或文件夹），但只会根据需要创建目录。

我的问题有什么想法或开源解决方案吗？谢谢。

score 3 · Accepted Answer

假设您按从 00000000 到 99999999 的数字生成文件名。您可以使用最后 3 位数字作为文件名，即您的目录看起来像 00000/000.xml 到 99999/999.xml

注意：除非您有 SSD，否则如果您在任何文件系统上有大量文件，您的性能将会下降。HDD 上的典型文件访问时间约为 8 毫秒。

score 1 · Accepted Answer

请参阅这篇出色的帖子以获得简单的解决方案：

http://michaelandrews.typepad.com/the_technical_times/2009/10/creating-a-hashed-directory-structure.html

import java.io.File;

public class DirectoryHash {
    public static void main(String[] args) {
        String fileName = "cat.gif";

        int hashcode = fileName.hashCode();
        int mask = 255;
        int firstDir = hashcode & mask;
        int secondDir = (hashcode >> 8) & mask;

        StringBuilder path = new StringBuilder(File.separator);
        path.append(String.format("%03d", firstDir));
        path.append(File.separator);
        path.append(String.format("%03d", secondDir));
        path.append(File.separator);
        path.append(fileName);

        System.out.println(path);
    }
}

score 0 · Accepted Answer

取文件名，MD5 它给你类似的东西abdefghijklmnop

将其分解为目录结构和文件名，例如

abc \ def \ ghi \ jkl \ mnop.txt

您可以根据要存储的文件数量选择如何分解目录（例如，您可能只需要将 8 个字符拆分为 3、3、2 而不是示例中的 3、3、3、3 深度多于）。

score 0 · Accepted Answer

这就是我使用提供的一些技巧实现的方式。首先是一个函数来分解 3 位数组中的索引：

private static void decompose(final long l, final short[] array) {
    long q = l;
    long r = 0;
    for (int j=array.length-1; j >= 0; j--) {
        // compute remainder
        r = q % 1000;
        // compute quotient
        // converts to int and fractional part is dropped without rounding
        q = q / 1000;

        array[j] = (short) r;
    }
}

然后，使用分解后的数组 (currentA) 创建子目录和文件对象。

    File dir = parent;
    for (int j=0; j < depth-1; j++) {
        String dirName = String.format("%03d", currentA[j]);
        dir = new File(dir, dirName);
    }
    String fileName = prefix + String.format("%03d", currentA[depth]) + suffix;
    File file = new File(dir, fileName);

java - 如何在磁盘上传播/散列多个文件而不在每个目录存储超过 1000 个文件？

4 回答 4

Related

Reference