1

I'm trying to scrape a web page that contains 200+ <li class="classToGet"> elements, which are loaded with AJAX as one scrolls down. When I read the site's source with urllib2.urlopen(url).read() I can only get the initial 100 <li>s.

When I turn JavaScript off in my browser and go to the page, all 200+ <li>s are displayed. How do I disable JavaScript for urllib2 as it loads the page?

Thanks for the help.

4

1 回答 1

0

我认为您与 http 标头用户代理有关我做了一个从 Google 图片获取图片的小项目。一开始,我使用的头部如下:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36

但是,我得到了在 Pinterest 中工作的页面,这不是我想要的。因为它必须获取页面。所以我将 User-Agent 值更改为另一个值:

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)

然后,它现在可以找到。它可以给我我想要的。

于 2013-07-15T16:53:02.883 回答