我开发了一个 PuppeteerCrawler 行为并想注入一些通用代码,用于所有页面。我找到了 Apify.utils.puppeteer.injectFile 方法,如果代码是由每个 'domcontentloaded' 事件注入的,该方法工作正常。但我只想注入一次。为此,有一个“surviveNavigations”选项,它应该会导致每个页面的重新注入。
不幸的是,这个选项对我不起作用。请在下面找到一些测试代码,它演示了我的问题。对于第一页,找不到“testfunction.js”,第二页没有。
我的代码有什么问题?
干杯沃尔夫冈
这是测试爬虫,它应该打开两个页面,注入 testfunction.js 的 ONCE 并执行它:
Apify.main( async () => {
const requestQueue = await Apify.openRequestQueue();
// Please replace urls by existing ones, if necessary!
// See here: '...'
await requestQueue.addRequest({ url: '...'});
await requestQueue.addRequest({ url: '...'});
var isAlreadyInjected;
const crawler = new Apify.PuppeteerCrawler({
requestQueue: requestQueue,
maxConcurrency: 1,
gotoFunction: async ({page, request}) => {
page.on('domcontentloaded', async () => {
if(! isAlreadyInjected){
await puppeteer.injectFile(page, 'testinject.js', {surviveNavigations: true} );
isAlreadyInjected = true;
}
});
return page.goto(request.url, {
waitUntil: 'domcontentloaded'
});
},
handlePageFunction: async ({ request, page }) => {
var finding = await page.evaluate( () => {
try{
return testinject();
} catch(err){
return `Test inject was NOT found!`;
}
});
log.info(finding + ` (${page.url()})`);
},
handleFailedRequestFunction: async ({ request }) => {
log.info( `Failed Request:\t${request.url}` );
},
});
await crawler.run();
});
这是要注入的“testfunction.js”:
testinject = function(){
return 'Test inject: I was found!'
};