我对 puppeteer-cluster 很陌生。我的目标是无限地抓取 100 个站点的列表,所以一旦我到达第 100 个链接,脚本就会重新开始(理想情况下重用同一个集群实例)。有没有更好的方法或正确的方法来做到这一点?我在想故意有一个无限循环(和旋转元素)可能会更容易。任何意见,将不胜感激。
这是我的代码:
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 20,
monitor: true
});
// Extracts document.title of the crawled pages
await cluster.task(async ({ page, data: url }) => {
await page.goto(url, { waitUntil: 'domcontentloaded' });
const pageTitle = await page.evaluate(() => document.title);
console.log(pageTitle);
});
// In case of problems, log them
cluster.on('taskerror', (err, data) => {
console.log(` Error crawling ${data}: ${err.message}`);
});
while (true) {
await new Promise(resolve => setTimeout(crawl, 5000));
}
async function crawl() {
for (let i = 0; i < sites.length; i++) {
const site = sites[i];
site["product_urls"].forEach(async (url) => {
await cluster.execute(url);
});
}
await cluster.idle();
}
})();