python - 使用scrapy获取“下一页”数据

Question

我需要抓取一个商品网站的评论数据，但是它的用户数据是分页的。每页评论是10条，大约有100页。我怎样才能把它们全部爬出来？

My intention is to use the yield and Request method to crawl the "Next Page" link, and then using the Xpath to extract data. But I can't jump to the next page to extract the data.

这是关于“下一页”链接的 Html 代码：

<div class="xs-pagebar clearfix">
     <div class="Pagecon">
          <div class="Pagenum">
               <a class="pre-page pre-disable">
               <a class="pre-page pre-disable">
               <span class="curpage">1</span>
               <a href="#" onclick="tosubmits(2):return false;">2</a>
               <a href="#" onclick="tosubmits(3);return false;">3</a>
               <span class="elli">...</span>
               <a href="#" class="next-page" onclick="tosubmits('2');return false;">Next Page</a>
               <a href="#" onclick="tosubmits('94');return false;">Final Page</a>
           </div>
     </div>
</div>

究竟是href="#"什么意思？

score 0 · Accepted Answer

不幸的是，您将无法使用 scrapy 执行此操作。href="#"是一个无处链接的锚链接（使其看起来像一个链接）。真正发生的onclick是执行的 javascript 处理程序。您将需要有一种执行 javascript 的方法来为您的用例执行此操作。您可能需要研究Splinter来执行此操作。

python - 使用scrapy获取“下一页”数据

1 回答 1

Related

Reference