I'm working on a project named robot hosting on GitHub. The job of my project is to fetch medias from the url which is given from the xml config file.And the xml config file has the defined format just as you can see in scripts dir.
My problem is as below.There are two args:
- A list which indicates how deep the web link is, and according to the selector(css selector) in the list item, i can find out the media url or the sub page url where i may finally find out the media.
- An arr which contains the sub page urls.
The simplified example as below:
node_list = {..., next = {..., next= null}};
url_arr = [urls];
I want to iterate all the items in the url arr, so i do as below:
function fetch(url, node) {
if(node == null)
return ;
// here do something with http request
var req = http.get('www.google.com', function(res){
var data = '';
res.on('data', function(chunk) {
data += chunk;
}.on('end', function() {
// maybe here generate more new urls
// get another url_list
node = node.next;
fetch(url_new, node);
}
}
// here need to be run in sync
for (url in url_arr) {
fetch(url, node)
}
As you can see, if use async http request, it must eats all system resources. And i can not control the process. So do anyone have a good idea to solve this problem? Or, is nodejs not the proper way to do such jobs?