2

I am relatively new to Node.js and I am trying to get more familiar with it by writing a simple module. The module's purpose is take an id, scrape a website and return an array of dictionaries with the data.

The data on the website is scattered across pages whereas every page is accessed by a different index number in the URI. I've defined a function that takes the id and page_number, scrapes the website via http.request() for this page_number and on end event the data is passed to another function that applies some RegEx to get the data in a structured way.

In order for the module to have complete functionality, all the available page_nums of the website should be scraped.

Is it ok by Node.js style/philosophy to create a standard for() loop to call the scraping function for every page, aggregate the results of every return and then return them all in once from the exported function?

EDIT

I figured out a solution based on help from #node.js on freenode. You can find the working code at http://github.com/attheodo/katina_node

Thank you all for the comments.

4

3 回答 3

1

The common method, if you don't want to bother with one of the libraries mentioned by @ControlAltDel, is to to set a counter equal to the number of pages. As each page is processed (ansynchronously so you don't know in what order, nor do you care), you decrement the counter. When the counter is zero, you know you've processed all pages and can move on to the next part of the process.

于 2012-07-05T15:07:28.887 回答
0

借助 Freenode 上#node.js 的有用评论,我设法按照 Node.js 理念的要求,通过顺序调用抓取函数并附加回调来找到解决方案。

你可以在这里找到代码:https ://github.com/attheodo/katina_node/blob/master/lib/katina.js

感兴趣的代码块位于第 87 行和第 114 行之间。

谢谢你们

于 2012-07-06T14:53:55.827 回答
0

您可能会遇到的问题是重新组合所有聚合结果。有几个库可以提供帮助,包括 Async 和 Step。或者你可以使用像 Fibers.Promise 这样的 Promise 库。但后者并不是真正的节点哲学,需要直接更改/添加节点可执行文件。

于 2012-07-05T15:00:32.697 回答