2

I am new to the asynchronous control flow of Node.js, my scraper works, but I can't help thinking that there must be a more optimal (elegant?) way of doing it, I am open to the use of other node library. More specifically:

  1. I feel that the current control flow (with all the callback) is hard to read, but maybe it's just because that this is new to me. There seems to be several library on control flow, should I be using it?
  2. Originally, my code made all the request first, parse and save everything on a records = [], than processed to write everything to file. I change the code here, so that it will request - parse - append for each record in the for loop, I will like to confirm whether this approach is better with large number of requests.
  3. Writing the records in JSON format caused some pain, currently I have to call a startStep to append the [ first, then use (flag? function(){flag = false; return "";}() : ",") to decide whether it's the first records, if not append comma first, then appending all records, then append ] at the end. Again, I'm curious whether there are better way of doing this.
  4. To iterate, I am declaring the list on the global scope, and using list.shift() to iterate over the next item, it seems to be fine now, but I think that this will caused side-effect in a large scale. My intuition is that I should passed the array as an argument. Again, I will like to get confirmation on this point.

    var fs = require('fs');
    var request = require("request");
    var cheerio = require("cheerio");
    
    function appendFile(_input, callback){
      fs.appendFile("./TED/alltalk3.json", _input, function(err){
        if(err){
          console.log("input is" + _input + "error is :" + err);
        }
        else{
          callback();
        }
      });
    }
    
    function startStep(){  
      appendFile("[", function(){
        console.log("--start--");
        getOneDay(list.shift());
      })  
    }
    
    function finalStep(){
      appendFile("]", function(){
        console.log("--end--");
        return;
      })
    }
    
    
    var flag = true; // first item no comma
    function getOneDay(itm){  
      if(itm){      
        request("http://www.ted.com/talks/view/id/" + itm, function(error, response, body) {
          var $ = cheerio.load(body)
          var record = {};
          record["title"] = $("#altHeadline").text();
    
          appendFile(
            (flag? function(){flag = false; return "";}() : ",") + (JSON.stringify(record, null, 4)), 
            function(){
              return getOneDay(list.shift());;
            }
          )
        });    
      }
      else{
        return finalStep();
      }
    }
    
    var list = [];
    for(var i = 1; i < 5; i++){
      list.push(i);
    }
    startStep();
    
4

2 回答 2

1

您尝试使用代码实现的是有限状态机(FSM),这是异步编程中使用的一种常见模式。某些语言具有内置支持。例如,C# 5.0 has async/await,通过为我们提供熟悉的线性代码流,极大地简化了异步编程。

已经有一些尝试引入async/awaitJavaScript。我相信,Node.js 和所有主要网络浏览器对它的全面支持只是时间问题。

在那之前,JavaScript 中最常见的异步代码流模式是Promise。它表示将在未来完成的操作的结果,并允许在完成后使用 JavaScript 回调函数采取行动。我建议您在代码中坚持这种模式。

更多资源:

于 2013-11-02T13:52:35.070 回答
1

强烈建议您查看https://github.com/caolan/async - 特别是它的 forEachSeries 方法 - 它看起来正是您所需要的。

在这种特殊情况下,我还可以推荐使用 fs 同步方法。不建议对服务使用阻塞方法,但对于类似 shell 的脚本是可以的。

于 2013-11-02T06:16:39.290 回答