最后,我们使用了非常简单的集合,与数组不同,即使有数千个条目,它仍然保持极高的性能和内存效率。这是我们最初的测试代码:
const fs = require('fs')
const readline = require('readline')
const memory = () => process.memoryUsage().rss / 1048576).toFixed(2)
const loadFile = (filename, cb) => {
// this is more complex that simply calling fs.readFile() but
// means we do not have to buffer the whole file in memory
return new Promise((resolve, reject) => {
const input = fs.createReadStream(filename)
const reader = readline.createInterface({ input })
input.on('error', reject)
reader.on('line', cb)
reader.on('close', resolve)
})
}
const start = Date.now()
const uniqueA = new Set()
const uniqueB = new Set()
// when reading the first file add every line to the set
const handleA = (line) => {
uniqueA.add(line)
}
// this will leave us with unique lines only
const handleB = (line) => {
if (uniqueA.has(line)) {
uniqueA.delete(line)
} else {
uniqueB.add(line)
}
}
console.log(`Starting memory: ${memory()}mb`)
Promise.resolve()
.then(() => loadFile('uuids-eu.txt', handleA))
.then(() => {
console.log(`${uniqueA.size} items loaded into set`)
console.log(`Memory: ${memory()}mb`)
})
.then(() => loadFile('uuids-us.txt', handleB))
.then(() => {
const end = Date.now()
console.log(`Time taken: ${(end - start) / 1000}s`)
console.log(`Final memory: ${memory()}mb`)
console.log('Differences A:', Array.from(uniqueA))
console.log('Differences B:', Array.from(uniqueB))
})
这给了我们这个输出(2011 Macbook Air):
Starting memory: 19.71mb
678336 items loaded into set
Memory: 135.95mb
Time taken: 1.918s
Final memory: 167.06mb
Differences A: [ ... ]
Differences B: [ ... ]
使用“哑”方法加载文件并在换行符上拆分甚至更快(~1.2s),但内存开销明显更高(~2x)。
我们使用的解决方案Set
还具有可以跳过排序步骤的优点,这也比原始问题中概述的 *nix 工具更快。