2

免责声明,自我回答的帖子,希望能节省其他人的时间。

设置

我一直在使用 chrome 的文件系统 API 实现[1] [2] [3]

这需要启用标志chrome://flags/#native-file-system-api

对于初学者,我想递归读取目录并获取文件列表。这很简单:

paths = [];
let recursiveRead = async (path, handle) => {
    let reads = [];
    // window.handle = handle;
    for await (let entry of await handle.getEntries()) { // <<< HANGING
        if (entry.isFile)
            paths.push(path.concat(entry.name));
        else if (/* check some whitelist criteria to restrict which dirs are read*/)
            reads.push(recursiveRead(path.concat(entry.name), entry));
    }
    await Promise.all(reads);
    console.log('done', path, paths.length);
};

chooseFileSystemEntries({type: 'openDirectory'}).then(handle => {
    recursiveRead([], handle).then(() => {
        console.log('COMPLETELY DONE', paths.length);
    });
});

我还实现了一个非递归的 while-loop-queue 版本。最后,我实现了一个节点fs.readdir版本。所有 3 种解决方案都适用于小目录。

问题:

但后来我尝试在铬源代码的一些子目录('base'、'components'和'chrome')上运行它;3 个子目录总共包含约 63,000 个文件。虽然节点实现运行良好(令人惊讶的是,它在运行之间使用了缓存结果,导致在第一次运行之后立即运行),两种浏览器实现都挂起。

尝试调试:

有时,他们会返回完整的 63k 文件并按'COMPLETLEY DONE'预期打印。但大多数情况下(90% 的时间)他们会在挂起之前读取 10k-40k 文件。

我深入挖掘了悬挂,显然这for await条线是悬挂的。所以我window.handle = handle在 for 循环之前添加了这一行;当函数挂起时,我直接在浏览器控制台中运行了 for 循环,它工作正常!所以现在我被困住了。我似乎有随机挂起的工作代码。

4

1 回答 1

1

解决方案:

我尝试跳过会挂起的目录:

let whitelistDirs = {src: ['base', 'chrome', 'components', /*'ui'*/]}; // 63800

let readDirEntry = (handle, timeout = 500) => {
    return new Promise(async (resolve, reject) => {
        setTimeout(() => reject('timeout'), timeout);
        let entries = [];
        for await (const entry of await handle.getEntries())
            entries.push(entry);
        resolve(entries);
    });
};

let readWhile = async entryHandle => {
    let paths = [];
    let pending = [{path: [], handle: entryHandle}];
    while (pending.length) {
        let {path, handle} = pending.pop();
        await readDirEntry(handle)
            .then(entries =>
                entries.forEach(entry => {
                    if (entry.isFile)
                        paths.push({path: path.concat(entry.name), handle: entry});
                    else if (path.length || !whitelistDirs[handle.name] || whitelistDirs[handle.name].includes(entry.name))
                        pending.push({path: path.concat(entry.name), handle: entry});
                }))
            .catch(() => console.log('skipped', handle.name));
        console.log('paths read:', paths.length, 'pending remaining:', pending.length, path);
    }
    console.log('read complete, paths.length');
    return paths;
};

chooseFileSystemEntries({type: 'openDirectory'}).then(handle => {
    readWhile(handle).then(() => {
        console.log('COMPLETELY DONE', paths.length);
    });
});

结果显示出一种模式。一旦目录读取挂起并被跳过,随后的约 10 次目录读取同样会挂起并被跳过。然后以下读取将恢复正常运行,直到下一个类似事件。

// begins skipping
paths read: 45232 pending remaining: 49 (3) ["chrome", "browser", "favicon"]
VM60:25 skipped extensions
VM60:26 paths read: 45239 pending remaining: 47 (3) ["chrome", "browser", "extensions"]
VM60:25 skipped enterprise_reporting
VM60:26 paths read: 45239 pending remaining: 46 (3) ["chrome", "browser", "enterprise_reporting"]
VM60:25 skipped engagement
VM60:26 paths read: 45266 pending remaining: 45 (3) ["chrome", "browser", "engagement"]
VM60:25 skipped drive
VM60:26 paths read: 45271 pending remaining: 44 (3) ["chrome", "browser", "drive"]
// begins working properly again

所以这个问题似乎是暂时的。我添加了一个简单的重试包装器,在重试之间等待了 500 毫秒,读取开始正常工作。

readDirEntryRetry = async (handle, timeout = 500, tries = 5, waitBetweenTries = 500) => {
    while (tries--) {
        try {
            return await readWhile(handle, timeout);
        } catch (e) {
            console.log('readDirEntry failed, tries remaining:', tries, handle.name);
            await sleep(waitBetweenTries);
            if (!tries)
                return e;
        }
    }
};

结论:

读取大型目录时,非标准本机文件系统 API 挂起。只需在等待后重试即可解决问题。我花了一周时间才得出这个解决方案,所以认为值得分享。

于 2019-12-12T17:03:44.283 回答