web-scraping-language - 网页抓取语言：如何进行分页抓取？

Question

我正在尝试运行以下 goto Flipkart，抓取所有产品链接并提取产品、价格和描述。但是，这只抓取一页，我想在所有页面上重复抓取，例如）第 1、2、3...等

GOTO flipkart.com/search?q=laptops&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off
CRAWL //div[2]/div[2]/div[1]/div//div[1]/a[@class="_2cLu-l"][1]
EXTRACT {
  "product": "//span[@class=\"_35KyD6\"][1]",
  "price": "//div[@class=\"_1vC4OE _3qQ9m1\"][1]",
  "description": "//div[@class=\"_3u-uqB\"][1]"
}

score 1 · Accepted Answer

您需要在分页器前面加上[[xpath_for_nextpage_element]]. In this case the xpath for the "next page" link is//nav/a[11]/span . You wrap[[ and]] around it and put it right after theCRAWL` 语句。所以我们得到： [[//nav/a[11]/span]]

GOTO flipkart.com/search?q=laptops&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off
CRAWL [[//nav/a[11]/span]] //div[2]/div[2]/div[1]/div//div[1]/a[@class="_2cLu-l"][1]
EXTRACT {
  "product": "//span[@class=\"_35KyD6\"][1]",
  "price": "//div[@class=\"_1vC4OE _3qQ9m1\"][1]",
  "description": "//div[@class=\"_3u-uqB\"][1]"
}

这本质上现在是一个抓取所有产品信息的刮板。

web-scraping-language - 网页抓取语言：如何进行分页抓取？

1 回答 1

Related

Reference