javascript - 我在 Node.js 中读取 tgz 文件的方式有问题吗？基准测试说它很慢:(

Question

我的功能基准：

mark@ichikawa:~/inbox/D3/read_logs$ time python countbytes.py
bytes: 277464

real    0m0.037s
user    0m0.036s
sys     0m0.000s
mark@ichikawa:~/inbox/D3/read_logs$ time node countbytes.js 
bytes: 277464

real    0m0.144s
user    0m0.120s
sys     0m0.032s

测量是在 Ubuntu 13.04 x86_64 位机器上进行的。

这是我的基准测试的简单版本（我也进行了 1000 次迭代）。我展示了我为读取 tgz 文件而编写的函数所花费的时间是我用 Python 编写的函数的 3 倍多。

对于 1000 次迭代，文件大小为 277kB（我使用了 process.hrtime 和 timeit）：

Node:   30.608409032000015
Python:  6.84210395813

对于 1000 次迭代大小 9.7MB：

Node:   590.491709309999
Python: 200.796745062

如果您对如何加快阅读 tgz 文件有任何想法，请告诉我。

这是代码：

var fs = require('fs');
var tar = require('tar');
var zlib = require('zlib');
var Stream = require('stream');


var countBytes = new Stream;
countBytes.writable = true;
countBytes.count = 0;
countBytes.bytes = 0;

countBytes.write = function (buf) {
    countBytes.bytes += buf.length;
};

countBytes.end = function (buf) {
    if (arguments.length) countBytes.write(buf);

    countBytes.writable = false;
    console.log('bytes: ' + countBytes.bytes);
};

countBytes.destroy = function () {
    countBytes.writable = false;
};


fs.createReadStream('supercars-logs-13060317.tgz')
    .pipe(zlib.createUnzip())
    .pipe(tar.Extract({path: "responsetimes.log.13060317"}))
    .pipe(countBytes);

知道如何加快速度吗？

score 0 · Accepted Answer

我看起来不错，但我很好奇为什么要使用tar流？

我会使用 a 来实现 countBytes Transform。我喜欢你用through2这个

var fs = require('fs')
, tar = require('tar')
, zlib = require('zlib')
, thr = require('through2')
, cache = {bytes: 0}
;
fs.createReadStream('supercars-logs-13060317.tgz')
  .pipe(zlib.createUnzip())
  .pipe(thr(function(chunk, enc, next){
    cache.bytes += chunk.length
    next(null, chunk)
  }))
  .on('end', function(){
    console.log(cache.count)
  })

javascript - 我在 Node.js 中读取 tgz 文件的方式有问题吗？基准测试说它很慢:(

1 回答 1

Related

Reference