I am new to the asynchronous control flow of Node.js, my scraper works, but I can't help thinking that there must be a more optimal (elegant?) way of doing it, I am open to the use of other node library. More specifically:
- I feel that the current control flow (with all the callback) is hard to read, but maybe it's just because that this is new to me. There seems to be several library on control flow, should I be using it?
- Originally, my code made all the
request
first, parse and save everything on arecords = []
, than processed to write everything to file. I change the code here, so that it willrequest - parse - append
for each record in the for loop, I will like to confirm whether this approach is better with large number of requests. - Writing the records in JSON format caused some pain, currently I have to call a
startStep
to append the[
first, then use(flag? function(){flag = false; return "";}() : ",")
to decide whether it's the first records, if not append comma first, then appending all records, then append]
at the end. Again, I'm curious whether there are better way of doing this. To iterate, I am declaring the list on the global scope, and using
list.shift()
to iterate over the next item, it seems to be fine now, but I think that this will caused side-effect in a large scale. My intuition is that I should passed the array as an argument. Again, I will like to get confirmation on this point.var fs = require('fs'); var request = require("request"); var cheerio = require("cheerio"); function appendFile(_input, callback){ fs.appendFile("./TED/alltalk3.json", _input, function(err){ if(err){ console.log("input is" + _input + "error is :" + err); } else{ callback(); } }); } function startStep(){ appendFile("[", function(){ console.log("--start--"); getOneDay(list.shift()); }) } function finalStep(){ appendFile("]", function(){ console.log("--end--"); return; }) } var flag = true; // first item no comma function getOneDay(itm){ if(itm){ request("http://www.ted.com/talks/view/id/" + itm, function(error, response, body) { var $ = cheerio.load(body) var record = {}; record["title"] = $("#altHeadline").text(); appendFile( (flag? function(){flag = false; return "";}() : ",") + (JSON.stringify(record, null, 4)), function(){ return getOneDay(list.shift());; } ) }); } else{ return finalStep(); } } var list = []; for(var i = 1; i < 5; i++){ list.push(i); } startStep();