java - 解析文件树对象中的许多路径。有没有有效的算法？

Question

我的代码需要创建许多文件路径的文件树作为

dir1/file1
dir1/dir2/file2
dir1/dir2/file3

FileTree 对象可视化示例：

dir1
|_file1
|_dir2
  |_file2
  |_file3

此树用于以图形形式可视化 torrent 内容文件。它还用于动态显示文件进度。在少数子文件夹和文件中它可以有效地工作，但如果路径 > 10,000 则需要大量内存和时间（> 4 秒和 50 MB RAM）。

有没有一种有效的算法来制作这样的图表？对我来说最关键的是图表制作速度。算法实现的示例可以用任何语言编写，这对我来说并不重要 :-) 在此先感谢。

为此目的，我的 Java 代码：

FileTree root = new FileTree(FileTree.ROOT, File.Type.DIR);
FileTree parentTree;

for (String pathToFile : paths) {
    parentTree = root;
    String[] nodes = FileIOUtils.parsePath(pathToFile); /*String.split(File.separator)*/

    for (int i = 0; i < nodes.length; i++) {
            /* The last leaf item is a file */
        if (i == (nodes.length - 1)) {
            parentTree.addChild(new FileTree(nodes[i],
                File.Type.FILE, parentTree));
        } else {
            parentTree.addChild(new FileTree(nodes[i], FileNode.Type.DIR, parentTree));
        }

        FileTree nextParent = parentTree.getChild(nodes[i]);
            /* Skipping leaf nodes */
        if (nextParent != null && !nextParent.isFile()) {
            parentTree = nextParent;
        }
    }
}

文件树类：

public class FileTree {
    public static final String ROOT = "/";
    /* The name for pointer to the parent node */
    public static final String PARENT_DIR = "..";

    protected String name;
    protected boolean isLeaf;
    protected FileTree parent;
    protected Map<String, FileTree> children = new LinkedHashMap<>();

    public FileTree(String name, int type, FileTree parent) {
        this(name, type, parent);
    }

    public FileTree(String name, int type)
    {
        this(name, type, null);
    }

    public FileTree(String name, int type, FileTree parent)
    {
        this.name = name;
        isLeaf = (type == File.Type.FILE);
        this.parent = parent;
    }

    public synchronized void addChild(FileTree node)
    {
        if (!children.containsKey(node.getName())) {
            children.put(node.getName(), node);
        }
    }

    public boolean contains(String name)
    {
        return children.containsKey(name);
    }

    public F getChild(String name)
    {
        return children.get(name);
    }

    public Collection<FileTree> getChildren()
    {
        return children.values();
    }

    public Set<String> getChildrenName()
    {
        return children.keySet();
    }
}

编辑：

可以达到平均 0.5-1 秒（早期 30 秒）创建 1000 个子文件夹的树的速度。

    FileTree root = new BencodeFileTree(FileTree.ROOT, 0L, File.Type.DIR);
    FileTree parentTree = root;
    /* It allows reduce the number of iterations on the paths with equal beginnings */
    String prevPath = "";
    /* Sort reduces the returns number to root */
    Collections.sort(files);

    for (String file : files) {
        String path;
        /*
         * Compare previous path with new path.
         * Example:
         * prev = dir1/dir2/
         * cur  = dir1/dir2/file1
         *        |________|
         *          equal
         *
         * prev = dir1/dir2/
         * cur  = dir3/file2
         *        |________|
         *         not equal
         */
        if (!prevPath.isEmpty() &&
                file.regionMatches(true, 0, prevPath, 0, prevPath.length())) {
            /*
             * Beginning paths are equal, remove previous path from the new path.
             * Example:
             * prev = dir1/dir2/
             * cur  = dir1/dir2/file1
             * new  = file1
             */
            path = file.substring(prevPath.length());
        } else {
            /* Beginning paths are not equal, return to root */
            path = file;
            parentTree = root;
        }

        String[] nodes = FileIOUtils.parsePath(path);
        /*
         * Remove last node (file) from previous path.
         * Example:
         * cur = dir1/dir2/file1
         * new = dir1/dir2/
         */
        prevPath = file.substring(0, file.length() - nodes[nodes.length - 1].length());

        /* Iterates path nodes */
        for (int i = 0; i < nodes.length; i++) {
            if (!parentTree.contains(nodes[i])) {
                /* The last leaf item is a file */
                parentTree.addChild(makeObject(nodes[i], parentTree,
                                i == (nodes.length - 1)));
            }

            FileTree nextParent = parentTree.getChild(nodes[i]);
            /* Skipping leaf nodes */
            if (!nextParent.isFile()) {
                parentTree = nextParent;
            }
        }
    }

score 0 · Accepted Answer

FileTree基本算法对我来说看起来不错，但是当您调用时创建了许多不必要的对象，这些对象在addChild它们已经存在的（常见）情况下会立即被丢弃。您可以尝试将参数传递给构造函数，并且仅在需要插入对象时才构造对象：

public synchronized void addChild(String name, int type, FileTree parent)
{
    if (!children.containsKey(name)) {
        children.put(name, new FileTree(name, type, parent));
    }
}

和：

if (i == (nodes.length - 1)) {
     parentTree.addChild(nodes[i], File.Type.FILE, parentTree);
} else {
     parentTree.addChild(nodes[i], FileNode.Type.DIR, parentTree);
}

可能没有必要传入parentTree：你可以用this.

另一种优化可能是从您处理的上一个路径中维护 String 对象（和关联的 FileTree 节点）的数组，并在添加子项之前进行扫描，直到找到与前一个不同的条目。

score 0 · Accepted Answer

我建议替换LinkedHashMap为，HashMap因为第一个会消耗更多内存。主要区别在于HashMap不保证条目的迭代顺序。但是您可以在 GUI 中订购孩子（可能无论如何您都有一些订购设置）。看看这个问题以获取一些参考资料。

另一个建议是从方法返回实际的子节点addChild

public synchronized FileTree addChild(FileTree node) {
    return children.putIfAbsent(node.getName(), node);
}

然后在循环内部不需要get再次调用地图

FileTree nextParent = parentTree.addChild(...

并且有看起来没有必要的条件

if (nextParent != null && !nextParent.isFile()) {
    parentTree = nextParent;
}

如果当前子项是文件，则看起来循环中不会有迭代。所以它可以安全地替换为

parentTree = parentTree.addChild(...

建议后循环体看起来像

for(...) {
    int type = if (i == (nodes.length - 1)) ? File.Type.FILE : FileNode.Type.DIR;
    parentTree = parentTree.addChild(new FileTree(nodes[i], type, parentTree);
}

java - 解析文件树对象中的许多路径。有没有有效的算法？

2 回答 2

Related

Reference