0

I'm working on a project named robot hosting on GitHub. The job of my project is to fetch medias from the url which is given from the xml config file.And the xml config file has the defined format just as you can see in scripts dir.

My problem is as below.There are two args:

  1. A list which indicates how deep the web link is, and according to the selector(css selector) in the list item, i can find out the media url or the sub page url where i may finally find out the media.
  2. An arr which contains the sub page urls.

The simplified example as below:

node_list = {..., next = {...,  next= null}};
url_arr = [urls];

I want to iterate all the items in the url arr, so i do as below:

function fetch(url, node) {
    if(node == null) 
        return ;
    // here do something with http request
    var req = http.get('www.google.com', function(res){
        var data = '';
        res.on('data', function(chunk) {
            data += chunk;
        }.on('end', function() {
             // maybe here generate more new urls
             // get another url_list
             node = node.next;
             fetch(url_new, node);
        }
}

// here need to be run in sync
for (url in url_arr) {
     fetch(url, node)
}

As you can see, if use async http request, it must eats all system resources. And i can not control the process. So do anyone have a good idea to solve this problem? Or, is nodejs not the proper way to do such jobs?

4

1 回答 1

1

If the problem is that you get too many HTTP requests simultaneously, you could change the fetch function to operate on a stack of URLs.

Basically you would do this:

  • When fetch is called, insert the URL into the stack and check if a request is in progress:
  • If a request is not running, pick the first url from stack and process it, otherwise do nothing
  • When a http request is finished, have it take a new url from the stack and process that

This way you can have the for-loop add all the URLs like now, but only one URL is processed at a time so there won't be too much resources being used.

于 2013-07-20T19:28:12.023 回答