django - django 动态刮板 range_funct 分页

Question

我django-dynamic-scraper在我的一个应用程序中使用，我已经阅读了文档，以下是我的设置：

我正在使用的对象类 url 是：http://www.example.com/products/brandname_products.html

网站上的分页类似于以下内容。

第 1http://www.example.com/products/brandname_products.html 页：第 2http://www.example.com/products/brandname_products2.html 页：第 3http://www.example.com/products/brandname_products3.html 页：第 4 页：http://www.example.com/products/brandname_products4.html

上述brandname网址是动态的，取决于品牌的产品页面。我不能为每个品牌使用不同的刮板，因为有超过 10000 个品牌，所以我尝试使用单个刮板对象。

在我使用的刮板对象中，我定义了分页选项如下：

pagination_type: RANGE_FUNCT
pagination_append_str: _products{page}.html
pagination_page_replace:1,100,2

但刮板请求以下分页网址

http://www.example.com/products/brandname_products.html_products2.html http://www.example.com/products/brandname_products.html_products3.html http://www.example.com/products/brandname_products.html_products4.html

代替

http://www.example.com/products/brandname_products2.html http://www.example.com/products/brandname_products3.html http://www.example.com/products/brandname_products4.html

_products.html问：为什么将替换字符串附加到 url 的末尾，而不是在对象类 url中实际替换它？我做错了什么，我该如何解决这个问题。

score 4 · Accepted Answer

该pagination_append_str选项是这样调用的，因为字符串被附加到基本 url 而不是替换它！:-)

所以一切都是正确的，你只需_products_html要从你的基本 url 中删除，这样最终的 url 就可以一起构建，而不会加倍 url 部分。

django - django 动态刮板 range_funct 分页

1 回答 1

Related

Reference