node.js - Reading file in segments of X number of lines

Question

I have a file with a lot of entries (10+ million), each representing a partial document that is being saved to a mongo database (based on some criteria, non-trivial).

To avoid overloading the database (which is doing other operations at the same time), I wish to read in chunks of X lines, wait for them to finish, read the next X lines, etc.

Is there any way to use any of the fscallback-mechanisms to also "halt" progress at a certain point, without blocking the entire program? From what I can tell they will all run from start to finish with no way of stopping it, unless you stop reading the file entirely.

The issues is that because of the file size, memory also becomes an issue and because of the time the updates take, a LOT of the data will be held in memory exceeding the 1 GB limit and causing the program to crash. Secondarily, as I said, I don't want to queue 1 million updates and completely stress the mongo database.

Any and all suggestions welcome.

UPDATE: Final solution using line-reader (available via npm) below, in pseudo-code.

var lineReader = require('line-reader');

var filename = <wherever you get it from>;
lineReader(filename, function(line, last, cb) {
    //
    // Do work here, line contains the line data
    // last is true if it's the last line in the file
    //

    function checkProcessed(callback) {
        if (doneProcessing()) { // Implement doneProcessing to check whether whatever you are doing is done
             callback();
        }
        else {
             setTimeout(function() { checkProcessed(callback) }, 100); // Adjust timeout according to expecting time to process one line
        }
    }

    checkProcessed(cb);
});

This is implemented to make sure doneProcessing() returns true before attempting to work on more lines - this means you can effectively throttle whatever you are doing.

score 2 · Accepted Answer

我不使用 MongoDB，也不是使用Lazy的专家，但我认为下面的内容可能会起作用或给你一些想法。（注意我没有测试过这段代码）

var fs   = require('fs'),
    lazy = require('lazy'); 

var readStream = fs.createReadStream('yourfile.txt');

var file = lazy(readStream)
  .lines                     // ask to read stream line by line
  .take(100)                 // and read 100 lines at a time.
  .join(function(onehundredlines){
      readStream.pause();    // pause reading the stream
      writeToMongoDB(onehundredLines, function(err){
        // error checking goes here
        // resume the stream 1 second after MongoDB finishes saving.
        setTimeout(readStream.resume, 1000); 
      });
  });
}

node.js - Reading file in segments of X number of lines

1 回答 1

Related

Reference