2

I'm writing a script in Python that will scrape some pages from my web server and put them in a file. I'm using mechanize.Browser() module for this particular task.

However, I've found that creating one single instance of mechanize.Browser() is rather slow. Is there a way I could relatively painlessly use multihreading/multiprocessing (i.e. issue several GET requests at once)?

4

2 回答 2

1

使用geventeventlet获取并发网络 IO。

于 2011-10-23T15:24:58.247 回答
1

如果你想要工业级的 Python 网页抓取,请查看scrapy。它使用 Twisted 进行异步通信,速度非常快。每秒能够爬取 50 页并不是一个不切实际的期望。

于 2011-10-26T11:57:08.037 回答